A Tutorial On Redaction

This image has been redacted for your convenience.

Writing software is fun. Redacting software is torturous. It gets worse when the software was never intended for redaction, but you need to redact it anyway.

I’ve worked on multiple projects in the past decade and all of them have involved redaction at some level. The source code, the documentation, the bug fix requests; I’ve redacted every type of digital thing you can think of. Sadly, even after all this time, I’ve yet to find a way to remove the vampiric qualities of redaction which consume the souls of those who perform it. However, I have learned a few ways to make redaction more effective with less time-consuming rework, and that’s what this post is all about.

Redaction, if you are uninitiated into the cult, is a form of evil magic where you remove sections of important information from documents or source code all while somehow retaining most of your sanity. Sometimes, the information is removed because you aren’t licensed to give away someone else’s work, but you need to deliver something that contains the work. Other times, you want to protect your own inventions but still be able to sell portions of your work. You might be allowed to make redaction obvious, printing black boxes where text should be. Or maybe you’ll need to completely hide the fact you messed with the content. This latter approach is the one I’ll assume, because I don’t think there’s a whole lot of value in the former.

What’s the Point?

Stop redacting for a second. It’s easy to jump into redaction work and go through some easy, repeatable steps to get your job done and end up missing the whole point of redaction. Remember, the reason you are redacting is because whoever is receiving the information you have can’t have specific stuff it contains.

Are you redacting to remove terms? Maybe the names of intellectual properties? If that’s all, you might think you can search and replace the contents of the files you need to redact. Replacing terms is easy enough. You can probably finish your redaction work in a few minutes. But what value is there in removing terms? If the people who are being provided the redacted material have any idea what the material was used for, they’ll know which terms were redacted. You haven’t done anything. And if they have no idea what it was used for, why do they care about it?

I’ve found that redaction is time-consuming and tedious, but also an inconsistent process. You can’t write a program to perform redaction for you, because a program can’t interpret every conceivable spelling error, phrasing (especially poor English), or acronym. Searching for terms with software is really helpful, but it only catches the most obvious stuff.

Consider this paragraph:

“The software uses a proprietary component called This Sure Is Awesome Technology. This technology is used to generate output in a comma-separated list, but in columns instead of rows. This is protected by patent Des. 555,555. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

Suppose you can’t transfer This Sure Is Awesome Technology, because it is licensed. And suppose you can’t transfer the patent information because of international law. A search for the relevant terms would get you what you want, but try just removing them:

“The software uses a proprietary component called. This technology is used to generate output in a comma-separated list, but in columns instead of rows. This is protected by patent. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

You’ve left in the nature of the technology in question and the content of the patent. It doesn’t take much effort for someone to replace your redacted text. So what was the point? You need to do this by hand:

“The software uses a proprietary component which translates files from one type to another. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

Here, the meaning is preserved, but not the method, which is the focus of the redaction. Redaction is almost always an effort to protect methods and concepts, so why apply a method that can’t protect anything?

Search, however, is limited. Consider a different writing of the same text:

“The software uses a proprietary component called TSiAT. This technology is used makes row-based CSVs. This is protected by PD 555,556. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

Your search won’t catch every possible spelling error. It won’t catch different forms of the same phrases, especially if those forms have poor grammar. It won’t catch acronyms you don’t expect. If you understand the point of redaction, you won’t consider your job complete just because a search doesn’t return terms from a list of “bad” words.

The Redaction Balance

It’s easy not to go far enough in your redaction and leave too much content behind. On the other hand, redacting massive sections of documents removes any value from them. At that point, why even give the documents away?

You’ve probably seen documents redacted by the government. You know, those poorly scanned pages that have a handful of words floating in a sea of black ink. You might find pieces of information here and there but you might not. Why did they release the document at all if there’s nothing in it?

A better approach, as described above, is to search for terms and concepts  yourself. You, a human being and not a computer, can understand practically everything that might show up in the information you are redacting. It’s tedious, it’s terrible, and it might be evil, but redaction is something you can’t describe in logical terms any more than you can describe writing a book in logical terms. You can’t delegate this terrible work to a computer, no matter how much you want to. And if you are doing redaction the right way, you’ll really, really want to.

Your goal is to remove terms and ideas in a careful way that doesn’t make it obvious that the terms and ideas ever existed. For instance, if you need to remove the section in brackets, do it like this:

“The tool is capable of [feature A], which does X, Y, and [Z] in order, as well as feature B, which does X and Y only.”


“The tool is capable of feature B, which does X and Y in order.”

A hard redaction of this, replacing [feature A] with [redacted feature] and [Z] with [redacted function] would read like this:

“The tool is capable of redacted feature, which does X, Y, and redacted function in order, as well as feature B, which does X and Y only.”

This gives away the fact a feature exists as well as 66% of what it does. If you want someone to know about “Feature B” and not “Feature A”, this is a terrible way to do it.

Acronyms Are Your Enemy

If a term you are replacing is an acronym, things get much worse.

Imagine you have an acronym like RED. That term might appear thousands of times in unrelated words: hundred, redaction, credibility, bred, not to mention the word “red” itself. If this term is just replaced forcibly with something like “Supplier Technology”, you end up with ridiculous sentences like:

There are two-hundSupplier Technology tests, each of which appear in black if they passed and Supplier Technology if they failed, establishing the cSupplier Technologyibility of the claim that the software was tested.”

You’d need to go through these results by hand, which is no faster than searching and reading in the first place. If you left this in place, anyone reading the redacted document would realize that “Supplier Technology” is clearly what all instances of “RED” become. Again, search-and-replace has accomplished nothing.

And this isn’t the end of the pain you will suffer at the hands of linguistic shortcuts. Laziness compels people to turn all sorts of things into acronyms where you may not expect them. And even if you try and expect them, they probably won’t use the same letters you would. This doesn’t even take spelling errors into account. A misspelled acronym is like a land mine of important information you can’t sweep for. It’s just waiting for the recipient of the redacted material to trip on, blowing up in your face. Acronyms are just another reason you should be performing redaction by hand.

Some Precautions

You may have no idea if the documents or software you are writing will be subject to redaction in the future. But if you somehow do know, there are a few things you should keep in mind.

If you want to make redaction trivial, don’t mix different proprietary information. If you are working with three companies, try to keep the IP of each of them separate, restricting interaction to as few documents as possible. You’ll find that this won’t be possible, at least completely. The closer you get to it, however, the easier the redaction.

If you want to make redaction impossibly difficult, use extremely short proprietary terms. For instance, two-character terms like “A1” will show up in binary data in guids, maybe millions of times. Even if an engineer can look through things at a rate of 10/second (which is practically begging for human error), that’s still four full terrible days to look for a term which might legitimately appear twice in its proprietary context. Inconsistent acronyms, lack of spelling and grammatical checks, and images all help multiply the length of time you will need to perform redaction. Avoid these things as much as possible in any material that might be redacted in the future.

Writing Well Means Thinking Clearly

In the three or four English classes I was required to take in college, there were no lectures on the topic of writing well. We “studied” politics – exclusively by reading poorly written papers created by our peers, all combined into a parody of a textbook – but we never studied English.

Good writing is a product of clear thinking. If you can’t get your thoughts into written form, you probably don’t understand what you are thinking about. When English professors stop teaching how to write clear English, they either do it because they are unqualified to teach or because they aren’t English professors in the first place but amateur political hacks. I’m not sure which of these is worse.

I’ve since graduated college, but bad writing thrives just as much in business as it does in education. Thankfully, good writers have addressed the problem before, and yesterday I found an old piece by George Orwell which had me thinking about it again.

…quite apart from avoidable ugliness, two qualities are common to [bad writing]. The first is staleness of imagery; the other is lack of precision. The writer either has a meaning and cannot express it, or he inadvertently says something else, or he is almost indifferent as to whether his words mean anything or not. This mixture of vagueness and sheer incompetence is the most marked characteristic of modern English prose, and especially of any kind of political writing.

Bad writing is ugly, stale, and imprecise. It follows that good writing is better in each way. I hope in my own writing to avoid ugliness, staleness, and imprecision.

Orwell lists a few bad habits that writers should avoid. Business-speak – the dread language invented by people who wanted to seem important by using many words to say very little – seems to be nothing but these habits taken as law:

Dying metaphors … there is a huge dump of worn-out metaphors which have lost all evocative power and are merely used because they save people the trouble of inventing phrases for themselves…

Operators, or verbal false limbs. These save the trouble of picking out appropriate verbs and nouns, and at the same time pad each sentence with extra syllables which give it an appearance of symmetry …  In addition, the passive voice is wherever possible used in preference to the active…

Pretentious diction. Words like phenomenonelementindividual (as noun), objectivecategoricaleffectivevirtualbasicprimarypromoteconstituteexhibitexploitutilizeeliminateliquidate, are used to dress up simple statements and give an air of scientific impartiality to biassed judgements…

Meaningless words. In certain kinds of writing, particularly in art criticism and literary criticism, it is normal to come across long passages which are almost completely lacking in meaning…

A quick look at some recent company emails I’ve received removes any doubt that the sort of language spoken in the business world is one with a passing resemblance to English. It has English words, but unlike English, it’s purpose is not the communication of information.

Consider these incredible phrases:

  1. “distracting instability”: This is pure jargon. It appears to confer information, but it is more of a passphrase used to indicate membership in a group – the group of professional businessmen. Like all jargon, it could be replaced by a simple English expression like “
  2. “operational excellence”: More jargon. This phrase is does not mean what the English words that comprise it mean, which makes it bad. A thing which is operational is in use or ready for use. Excellence, on the other hand, is the quality of surpassing mere goodness and being great. Imagine someone using the phrase “operational red”. The only difference is substituting one quality with another. It doesn’t make any sense, either.
  3. “get the ball rolling”: A metaphor that can always be replaced by the word “start”.
  4. “bubbled to the top”: There are few more complicated or less clever ways to say “rose”.
  5. “compliant to the ever-evolving requirements related to this area”: The end of the phrase (“related to this area”) is redundant. Was there any question the requirements were related to what we’re already talking about?
  6. “opportunities for growth”: More redundancy. The word “opportunities” gets to the point without the botany reference.
  7. “tackle this challenge”: This is not only a dying metaphor, but a bad one in the first place. You don’t “tackle challenges”. Challenges are abstract, and tackling is a physical act.
  8. “eliminating potential delays”: Since potential delays, being potential and not actual, do not actually exist, it seems impossible to know what they are, let alone to eliminate them.
  9. “a ticket to entry toward building a partnership”: Another dying metaphor, this time used to pad a sentence toward artificial importance. The entire phrase “a ticket to entry toward” could be replaced with the single-syllable word “start”. Does the author know that English has such a word available?
  10. “this will allow us to ensure we not only enable”: We will do something. What will we do? We will be allowed. What will be allowed? We will be allowed to ensure. What will we be allowed to ensure? That we not only enable, but also do something else. All that this phrase adds is confusing layers of verbs. Is that a useful device in other languages?
  11. “working to leverage”: The word “leverage”, outside of physics, can always (ALWAYS) be replaced with the word “use”. And it always (ALWAYS) should be.

I’m probably guilty of business-speak and other errors in writing. This is especially so because I didn’t realize just how bad business-speak was until years after I began being forced to read it.

Useful to me, and hopefully useful to you, Orwell gives a list rules to keep in mind as you write:

i. Never use a metaphor, simile or other figure of speech which you are used to seeing in print.

ii. Never use a long word where a short one will do.

iii. If it is possible to cut a word out, always cut it out.

iv. Never use the passive where you can use the active.

v. Never use a foreign phrase, a scientific word or a jargon word if you can think of an everyday English equivalent.

vi. Break any of these rules sooner than say anything outright barbarous.

Orwell’s purpose for expounding the virtue of good writing is to avoid the political manipulation that requires bad writing to hide bad thinking. This same sort of motivation exists in the business world. Business-speak is used to hide things – ignorance, motivations, lies, manipulation, information – from readers by making those readers feel they’ve been told something important and informative.