A Tutorial On Redaction

This image has been redacted for your convenience.

Writing software is fun. Redacting software is torturous. It gets worse when the software was never intended for redaction, but you need to redact it anyway.

I’ve worked on multiple projects in the past decade and all of them have involved redaction at some level. The source code, the documentation, the bug fix requests; I’ve redacted every type of digital thing you can think of. Sadly, even after all this time, I’ve yet to find a way to remove the vampiric qualities of redaction which consume the souls of those who perform it. However, I have learned a few ways to make redaction more effective with less time-consuming rework, and that’s what this post is all about.

Redaction, if you are uninitiated into the cult, is a form of evil magic where you remove sections of important information from documents or source code all while somehow retaining most of your sanity. Sometimes, the information is removed because you aren’t licensed to give away someone else’s work, but you need to deliver something that contains the work. Other times, you want to protect your own inventions but still be able to sell portions of your work. You might be allowed to make redaction obvious, printing black boxes where text should be. Or maybe you’ll need to completely hide the fact you messed with the content. This latter approach is the one I’ll assume, because I don’t think there’s a whole lot of value in the former.

What’s the Point?

Stop redacting for a second. It’s easy to jump into redaction work and go through some easy, repeatable steps to get your job done and end up missing the whole point of redaction. Remember, the reason you are redacting is because whoever is receiving the information you have can’t have specific stuff it contains.

Are you redacting to remove terms? Maybe the names of intellectual properties? If that’s all, you might think you can search and replace the contents of the files you need to redact. Replacing terms is easy enough. You can probably finish your redaction work in a few minutes. But what value is there in removing terms? If the people who are being provided the redacted material have any idea what the material was used for, they’ll know which terms were redacted. You haven’t done anything. And if they have no idea what it was used for, why do they care about it?

I’ve found that redaction is time-consuming and tedious, but also an inconsistent process. You can’t write a program to perform redaction for you, because a program can’t interpret every conceivable spelling error, phrasing (especially poor English), or acronym. Searching for terms with software is really helpful, but it only catches the most obvious stuff.

Consider this paragraph:

“The software uses a proprietary component called This Sure Is Awesome Technology. This technology is used to generate output in a comma-separated list, but in columns instead of rows. This is protected by patent Des. 555,555. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

Suppose you can’t transfer This Sure Is Awesome Technology, because it is licensed. And suppose you can’t transfer the patent information because of international law. A search for the relevant terms would get you what you want, but try just removing them:

“The software uses a proprietary component called. This technology is used to generate output in a comma-separated list, but in columns instead of rows. This is protected by patent. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

You’ve left in the nature of the technology in question and the content of the patent. It doesn’t take much effort for someone to replace your redacted text. So what was the point? You need to do this by hand:

“The software uses a proprietary component which translates files from one type to another. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

Here, the meaning is preserved, but not the method, which is the focus of the redaction. Redaction is almost always an effort to protect methods and concepts, so why apply a method that can’t protect anything?

Search, however, is limited. Consider a different writing of the same text:

“The software uses a proprietary component called TSiAT. This technology is used makes row-based CSVs. This is protected by PD 555,556. Our tool uses this tool to turn pictures of ducks into pictures of chickens. Chickens are better than ducks.”

Your search won’t catch every possible spelling error. It won’t catch different forms of the same phrases, especially if those forms have poor grammar. It won’t catch acronyms you don’t expect. If you understand the point of redaction, you won’t consider your job complete just because a search doesn’t return terms from a list of “bad” words.

The Redaction Balance

It’s easy not to go far enough in your redaction and leave too much content behind. On the other hand, redacting massive sections of documents removes any value from them. At that point, why even give the documents away?

You’ve probably seen documents redacted by the government. You know, those poorly scanned pages that have a handful of words floating in a sea of black ink. You might find pieces of information here and there but you might not. Why did they release the document at all if there’s nothing in it?

A better approach, as described above, is to search for terms and concepts  yourself. You, a human being and not a computer, can understand practically everything that might show up in the information you are redacting. It’s tedious, it’s terrible, and it might be evil, but redaction is something you can’t describe in logical terms any more than you can describe writing a book in logical terms. You can’t delegate this terrible work to a computer, no matter how much you want to. And if you are doing redaction the right way, you’ll really, really want to.

Your goal is to remove terms and ideas in a careful way that doesn’t make it obvious that the terms and ideas ever existed. For instance, if you need to remove the section in brackets, do it like this:

“The tool is capable of [feature A], which does X, Y, and [Z] in order, as well as feature B, which does X and Y only.”

Becomes:

“The tool is capable of feature B, which does X and Y in order.”

A hard redaction of this, replacing [feature A] with [redacted feature] and [Z] with [redacted function] would read like this:

“The tool is capable of redacted feature, which does X, Y, and redacted function in order, as well as feature B, which does X and Y only.”

This gives away the fact a feature exists as well as 66% of what it does. If you want someone to know about “Feature B” and not “Feature A”, this is a terrible way to do it.

Acronyms Are Your Enemy

If a term you are replacing is an acronym, things get much worse.

Imagine you have an acronym like RED. That term might appear thousands of times in unrelated words: hundred, redaction, credibility, bred, not to mention the word “red” itself. If this term is just replaced forcibly with something like “Supplier Technology”, you end up with ridiculous sentences like:

There are two-hundSupplier Technology tests, each of which appear in black if they passed and Supplier Technology if they failed, establishing the cSupplier Technologyibility of the claim that the software was tested.”

You’d need to go through these results by hand, which is no faster than searching and reading in the first place. If you left this in place, anyone reading the redacted document would realize that “Supplier Technology” is clearly what all instances of “RED” become. Again, search-and-replace has accomplished nothing.

And this isn’t the end of the pain you will suffer at the hands of linguistic shortcuts. Laziness compels people to turn all sorts of things into acronyms where you may not expect them. And even if you try and expect them, they probably won’t use the same letters you would. This doesn’t even take spelling errors into account. A misspelled acronym is like a land mine of important information you can’t sweep for. It’s just waiting for the recipient of the redacted material to trip on, blowing up in your face. Acronyms are just another reason you should be performing redaction by hand.

Some Precautions

You may have no idea if the documents or software you are writing will be subject to redaction in the future. But if you somehow do know, there are a few things you should keep in mind.

If you want to make redaction trivial, don’t mix different proprietary information. If you are working with three companies, try to keep the IP of each of them separate, restricting interaction to as few documents as possible. You’ll find that this won’t be possible, at least completely. The closer you get to it, however, the easier the redaction.

If you want to make redaction impossibly difficult, use extremely short proprietary terms. For instance, two-character terms like “A1” will show up in binary data in guids, maybe millions of times. Even if an engineer can look through things at a rate of 10/second (which is practically begging for human error), that’s still four full terrible days to look for a term which might legitimately appear twice in its proprietary context. Inconsistent acronyms, lack of spelling and grammatical checks, and images all help multiply the length of time you will need to perform redaction. Avoid these things as much as possible in any material that might be redacted in the future.

Digital Limitations

If you’ve never heard of the Art of Manliness, you need to head over and check it out. It’s one of the best sites around. Thanks to a culture that demeans masculinity more every year and devalues fathers and husbands, we really need content like the Art of Manliness provides.

The AoM Podcast (which you should subscribe to) recently featured a book by David Sax called The Revenge of the Analog. I haven’t read the book, but the interview was thorough, and I got the impression that author has a pretty good idea of the situation he is describing. The essential point he makes in the book is that, despite the benefits of digital technology, people are increasingly moving away from digital approaches to doing things that can be done by hand. A few examples he offers are:

  • A demand for vinyl records that has caused a rebirth of the record pressing and distribution industries.
  • Paper planners, calendars, and pocket notes.
  • Physical books dominating an industry that was “fated”, according to experts several years ago, to be entirely digital by now.

He made the important point during the interview that these things aren’t simply an example of hipsters wanting to differentiate themselves. Most of these things are being purchased and used by all kinds of people, and the industries making them are growing; a sign that this is a mainstream phenomenon. It’s also a phenomenon the author discovered first-hand.

David Sax relates a story during the podcast about how he and a roommate had set up a digital music system through their home to stream audio from a computer. They suddenly had access to any music they wanted at any time in any room they wanted it with a couple of clicks. Within a few weeks, the amount of music that was actually played had dropped to almost nothing. There was something about the digital approach that made listening to music lose its appeal.

The interview shifted from descriptions of the phenomenon to explanations early on, and I agree with several of the points the author made. First of all, the move away from digital products isn’t caused by a single force. There are all sorts of different reasons and they vary depending on who you talk to. Second, very few people are interested in giving up every digital luxury they have. Instead, it seems that people want a balance that doesn’t exclude physical objects, and that in many cases (but not all), physical objects are preferred.

The motivations for these preferences were described as irrational, which was about the only thing I disagreed with. It’s true that people give up some convenience and features by choosing physical objects instead of digital replacements, but I think the choice is rational. In fact, I think the choice is spiritual. This was an element that I didn’t hear in the interview, but which may be in the book.

From a Christian perspective, I can affirm the tangible benefits of reading a physical book over a digital book, for instance (it’s easier to remember the content when you can imagine the book; books allow for note-taking; books don’t require power). But there are certain intangible benefits that I think are spiritual in nature that I think the Christian worldview can account for.

God created the physical world and He called it Good. It’s His Creation, after all. There’s something in our human natures that makes us appreciate physical objects. There’s something in the nature of men especially, I’ve found, that makes us appreciate collections of physical objects and their maintenance and organization. In a fallen world and with our human nature corrupted so that we can fall into sin by coveting what others have, by being inordinately proud of what we own, by thinking ourselves better than others for our possessions, or by thinking that physical things are ultimate. These are terrible things and we need to carefully avoid each of them. But these are sinful precisely because they corrupt something good. And what is good is human beings creatively making things like their Father before them and maintaining Creation. There’s something about physically sensing a book through sight, touch, and smell which reminds us of the creative process and which lets us maintain Creation itself in a small way. That isn’t to say that digital incarnations are somehow bad or not a result of human creativity, but that physical objects have a benefit that can’t really be transferred to digital counterparts.

I think the “revenge of the analog” is a small symptom of a larger desire that our civilization has to move back to something more concrete, universal, and objective. People have been jaded by promises that we can control everything about ourselves and our natures which aren’t true. The same movement is seen in the increased interest in liturgy in churches, in Christianity in philosophy departments, in more interest being generated for trades than graduate degrees, and even in a booming board game industry.

It seems like we’ve reached the tipping point in our world where enough people are ready to move back to more permanent things that even people not paying attention to them are starting to notice the effects. And this is a good thing.