Similarities between cryptography and censoring visual data
One of the intrinsic values of the internet is the ability to easily distribute information on a wide scale with little effort. Perhaps paramount is the contributers ability to modify, alter and control this information; to be selective and pick and choose the information distributed. This holds not only true for works of text, but visual works, such as video and the most pervasive format, still images. In this posting, I will attempt, in laymen terms, to compare cryptography of textual messages with censoring still visual data (images, documents). I will present real world examples used to remove encoding or censorship of visual data to support this comparison.
Often times when an individual choses to distribute an image on a wide-scale in a public forum, or even distribution to a more focused and trusted group, there may be aspects of an image that the creator may wish to remove or sanitize. Examples include checks with account numbers, receipts with credit card numbers, faces, people, places, etc. This is typically accomplished by blurring, or otherwise applying a filter, technique or method to the sensitive information within the image. However, time and time again, these methods are not nearly as successful as the creator of the work likely intended.
The (over)use of parenthesis and multiple similar words strung together by “or” and “and” is to reach a wider audience without confusing a reader that is unfamiliar with cryptography. The intent is also to keep the interest of a more technical user familiar with cryptography concepts by presenting an interesting parallel. A reader possessing knowledge of cryptography that is greater than that of the average bear will likely appreciate the elegance between an automated approach to obfuscating portions of non-text visual data and plain-text cryptography. Those that do not have a strong crypto background will hopefully gain an appreciation for cryptography and it’s similarities in image manipulation.
The goal of the next paragraph is to make an extremely simple and understandable comparison between cryptography of text and filtering of portions of an image deemed sensitive.
Simply put, cryptography with a key known by both the transmitting and receiving parties is likened to a mathematical formula – think German Enigma machine. If the receiving party understands which modifying function (or secret key) was used to make the original data unreadable, then the receiving party can view or access the original data. Now, let’s apply this to an image filter. If the receiving party knows the modifying function/method (or secret key) used to make the portion of the image unreadable (from blurring, swirling, or pixelation), then the receiving party can view or access the original data.
Decrypting cryptography is accomplished by three methods.
- A flaw within the cryptographic algorithm itself – a mistake within the fundamental methods of cryptography chosen that defeats the difficulty of deciphering cryptography because it was built incorrectly
- Brute force – trying every possible key combination against the cryptographic algorithm until one works
- Knowing the key – understanding the appropriate values to correctly decipher the encoded message
Now, let’s associate the three previous methods of decrypting text/data to decoding/deciphering data that is not in a plain text format. Let’s keep in mind that an image is simply a collection of numbers.
- Break image obfuscation with a flaw in the obfuscation method: Around the click of the new millennium, the New York Times published a secret/classified report they obtained (found here on their website) regarding an attempt by the CIA to overthrow the government of Iran. The New York Times, with the goal and motivation of publicity and increasing circulation in mind (the greater the circulation, the greater the ad revenue), decided to publish this document. In an attempt to protect the families of the agents involved in the operation, the N.Y.T. blacked-out the names of those agents. However, the flaw in the method used to render the data unreadable allowed individuals with a slower computer to view the names of the agents. This was widely publicized by cryptome.org. Moral of the story? Don’t censor sensitive data with the digital equivalent of a piece of painters tape. You can see other examples of the PDF document abuse listed here. This method is obviously flawed, and allows individuals to view sanitized data originally thought to be safe.
A flaw in the method chosen for data censoring is akin to a fundamentally flawed encryption algorithm (such as DES).
- Break the method of obfuscating an image by trying all possible combinations of the techniques used to obfuscate the selected portion of the image. An image is merely an array, matrix, spreadsheet or collection of pixels (dots) that have a numerical value. By applying the same mathematical function to a specific subset of those… dots… (with pauses emphatically added), the hidden or obscured portion of the image can be viewed. Real world examples of this include blurring account numbers on a credit card or scanned document and claiming the scanned image to be safe. By taking values of the pixels (dots), applying all possible sequences of likely mathematical formulas (i.e. the single mathematical formula that occurred during the blurring) to those values and converting the resulting values to an image, you will then have something to visually compare to the original obfuscated image. If there is a visual match between the original image and the modified (blurred) image, then the key or formula that created the end result is the key or formula that was used against the original data. This technique is demonstrated in a visual fashion here.
Attempting all possible keys or blur techniques and comparing the results is equivalent to brute forcing all possible passwords required to decrypt an encrypted document.
- Possess the knowledge of the key, or method of obfuscation. Consider applying a swirl filter with the intention to obfuscate portions of an image. The most famous real-world example in recent history is that of a notorious serial pedophile who was apprehended due to efforts put forth by Interpol. The criminal wanted to demonstrate to his social circle in an anonymous fashion, that he was legitimately a pedophile (don’t ask why, I’m not a psychologist). As such, he posted pictures of himself with young children, but used a “swirl” filter on his face. The second picture in the photo gallery on the AOL news site demonstrates the “after” and “before” photos. Now let’s think, if one twists the pixels in an image clockwise in an attempt to alter a portion of a photograph unreadable, what is preventing a counter-clockwise rotation of the pixels in an image?
The swirl technique is the key, apply the swirl technique appropriately (in reverse), and you have the original image.
Each of these examples demonstrates a somewhat simple parallel between an action, filter or technique of obfuscating or encoding image-like data and rendering plain text (a la German Enigma machine) unreadable with cryptography. What can we take away from this simple parallel of plain-text encryption and image obfuscation? If one is to reproduce and censor or redact portions of an image or image-like data, ensure that the method chosen is mathematically irreversible or not defeated by a weakness in the method chosen – much like a smart choice of cryptography.