This is the second post relating to picture and video evidence found on the web which is certainly a topic of interest in the investigation community. This follows a previous post on the subject from October where I had the opportunity to get Privacy Lawyer David Fraser’s thoughts on the subject.
In this industry we are taught (either through our training or through a few unfortunate life-lessons) that we have to verify and re-verify clues that we obtain in our information gathering activities. Of course, this process is to make sure we get the facts straight and to avoid the bias that happens when become enamoured with information that supports the story that we believe is true. This is especially relevant in the area of images vis a vis Internet investigations because firstly, images are very powerful at conveying information and secondly, because frankly there are so many of them on the Internet period.
The impact of image forgery in investigations really hit home to me after reading “Adobe Photoshop Forensics, Sleuths, Truths and Fauxtography” written by Cynthia Baron who is the Associate Director Digital Media at North Eastern University.
One of the individuals whose work is referenced in Professor Baron’s book is a gentleman named Dr. Neal Krawetz of Hacker Factor Solutions. Dr. Krawetz is well known in the field of image forensics and online profiles. He holds a Ph.D. in Computer Science from Texas A&M University and a Bachelors degree in Computer and Information Science from the University of California, Santa Cruz.
I caught up with Dr. Krawetz and asked him a few questions about his views on the issue of fake and forged images on the web, social media and how all of this might affect the field of information gathering. We ended up covering a range of topics from political propaganda to fake profiles on social media sites, all of which are relevant to the field.
The interview is set forth below, as in previous discussions, the questions bolded and Dr. Krawetz answers do not have any emphasis. This post is a little longer than most, but I think that readers will find the information contained to be highly relevant and interesting… but at least I’m being overt about my bias.
Question 1: After reading Professor Baron’s book I was alerted to some indications that one should look for when examining an image that they might believe is forged. Some of the factors relate to the “context” of the image such as who posted the image in the first place, what might their motivation be for forging an image, etc. and other indications include tell tale signs within the image itself.
On your blog you often refer to things such the direction of shadows and light sources, artifacts and other factors that exist within the image. In your view, what are some of the key tell tale signs of image manipulation or forgery that a researcher or investigator should be aware of so that they may refer the photo to the proper expert?
Professor Baron’s book provides a great overview of the problem space. I look for different things, depending on the type of problem.
In my blog, I usually review images found on the web. I try to make no assumptions about who posted it. Many times, the person who puts the image on the web is not the person who took the original picture. In fact, it is really common to see a resaved image propagated across websites.
As an example, a high-quality photo from Reuters was probably edited by the photographer prior to submission, and may also be modified by the media organization. (Reuters and other media organizations usually permit very minor retouches. Other media organizations have their own sets of rules, and many guidelines are not published publicly.)
Eventually the photo gets passed to news outlets who crop, color, and modify the images. Then it goes to the webmaster who may resize the image or lower the quality (so the image takes up less bandwidth). From there, the image may be copied to other websites, modified or cropped or edited, and resaved again…
I actually spend a good amount of time trying to track down the originals — or copies that are closer to the originals — just to get a higher quality image for analysis. Many of my tools attempt to identify “what was last changed”. Whether it was a crop or a logo, identifying the manipulation can help identify the source. Also, it is common to find a higher quality version that can be used to immediately validate a modification in a lower quality image.
When evaluating images, I usually do a two-step analysis. First, I try to identify the quality of the image. This tells me which subsequent algorithms to use and the reliability for the results. I use techniques such as evaluating the JPEG quantization tables for quality level, Principal Component Analysis (great for seeing quality artifacts from lossy image formats), error level analysis, and even luminance gradient can emphasize the overall quality and permits the identification of different quality levels across the image.
The second step is to identify what actually happened in the image. For this, I apply a huge suite of tools that evaluate different factors. The main idea is to use a wide variety of different tools. A modification may pass some tests, but is unlikely to pass all tests.
As far as tell-tale signs go… It really depends on the artist. Most people have access to “good enough” image editing tools. However, few people have the skill, time, and patience to make a perfect forgery. The most common things missed by an average web artist are coloring, lighting, and texture.
There are different types of photo analysis experts. I primarily focus on the semi-automated digital analysis. My tools do not look at the subject matter. Instead, they focus on artifacts. Baron, on the other hand, is a very talented visual content analyst. I have shared a number of fun and challenging photos with her and I am always amazed that she can see — without any significant analysis tools — the same modifications that my tools identify.
Personally, I don’t trust my eyes — many computer graphics (CG) or enhanced images look real to me. It isn’t until I run my tools that the manipulations are highlighted enough for me to easily see. However other experts can visually inspect images and accurately identify modifications.
With regard to Shadows, they are interesting for a couple of reasons. Amateur artists usually create them wrong. Rendered images have difficulty making shadows appear realistic. (Ray tracing may make them too dark and crisp, and radiosity can make them too light and diffuse.) Spliced images frequently have conflicting shadows.
For real photos, shadows can identify off-camera objects and subtle topography differences. And more fun: under the right conditions, real shadows can be used to identify latitude, time of year, and even time of day — and this determination can even work with low resolution images.
Question 2: Moving away from images of people to images of documents what are some of the tell tale signs you look for about copies of documents posted on the Internet?
Documents are actually much easier to analyze in many ways. Most documents are scanned in and scanner artifacts will appear all over the image.
Artifacts from drift, CCD noise, color registration alignment, and dust or scratches are easy to detect, distort during manipulation, and are difficult to correct after editing. Depending on the scanner and image format, artifacts from a demosaicing algorithm may also be detectable and are very difficult to correct with most editing tools.
With documents, people usually add text, alter text, and use fill or clone tools to erase. These changes show up in a variety of analysis methods.
Question 3: Since a good chunk of images saved on line are in .jpeg format, do you have any specific suggestions that investigators should be on the lookout for with this format?
I have found JPEG to be both the easiest and hardest image format to analyze.
In loss-less image formats, edits to the image obliterate everything that was previously there. The fact that JPEG is a lossy format that blurs colors together means that there are traces of every change ever made to the image. This is similar to the Locard Exchange principle in physical forensics: every contact leaves a trace.
However, very low quality images — either due to multiple resaves or saves at a very low quality — may lose much of the older information. As a result, there is always more confidence in detecting the most recent changes.
Fortunately, images on the web from scanners, cameras, and other sources are usually provided in JPEG format. (Google currently reports over two million JPEG files, compared to under one million PNG and a few thousand BMP.) JPEG files also include distinct attributes, such as meta data, thumbnail images, and quantization tables. All of these attributes can be used for image forensics.
Question 4: On your blog you do a lot of work comparing different photos of the same incident from different sources and times. How important is it to have these other photos to do the comparison?
A single picture provides a frozen moment in time. A series of pictures provide context. For example, a single photo may show a girl with a tear on her face. A series of photos may show her laughing so hard that she cries. A single photo may show a man holding his dead cousin. A series of photos may suggest that the sequence was staged.
With the Kim Jong Il “clapping” photos, the pictures appear to show the same person wearing the same clothing and having the same hair… taken two years apart. Without the context, one could believe that the photo was recent. Although we may not know when the first photo in the group was taken, we do know that the first one was released two years ago, and the recent photo is likely from the same photo-op.
And then there is Iran… Everyone noticed the clone-tool duplication from adding in another rocket. But few people realized that the other photos of the missile launch were from two years earlier. Iran did the same thing when they released video footage of an incident in the Straight of Hormuz (January 2008). Their video footage was not from the same incident.
Rather than editing or photoshopping images, it is very common for pictures to be represented out of context. Old photos may be called new, or described as something that they are not.
Question 4 Continued: How important are citizen journalists or bloggers to the ability to provide comparative content?
I once read that five media companies owned about 90% of U.S. media outlets. The environment is so high pressure that reporters no longer have time to do any detailed investigative reporting or even fact checking. The various subsidiaries and media outlets all cover the same information, and much of it comes from the web. I have seen news topics appear on Slashdot and Digg, only to be reported by FOX and CNN hours later. And since there is no fact checking, misinformation can be rapidly distributed.
As an example, in my blog I mentioned that the Kim Jong Il photos were not recent. The next day my web logs shows accesses from major media companies, and a day after that, they ran their own stories about the photos being old.
Blogs have become a new form of self-published media. Blogs are independent of the mass-media mega-corporations. While many blogs contain fictional (and usually humorous) missives, some are very informative.
It was in blogs where we first learned that China’s antelope-train image and extinct tiger photo were fake. Blogs provide a means for independent analysis, criticism, and evaluation. They are the closest thing to “free speech” that the world has seen.
Bloggers also have the ability to make significant changes. The US Transportation Security Administration altered some of their policies following feedback on their blog. The evaluation of Obama’s birth certificate on my blog, as well as others, discredited the false claims by faux experts and the claims of digital modification vanished.
Question 5: Further on the subject of photographic manipulation and social media, do you have any thoughts on its pervasiveness in the personal Web 2.0 sphere versus say corporate, journalistic or political content on the web? There is anecdotal evidence in news media that there is image manipulation for personal reasons, such as cyberbullying, harassment or defamation. Do you have any comments on how prolific it is or how complex the image manipulation is?
I don’t have any quantitative metrics, but it is extremely prolific. Every ad, every picture, and every photo used by the mass media had been manipulated. Most modifications are harmless — scaling, minor recoloring (especially for print copy), cropping, etc. On occasion, the edits are more significant. There is a great website called “Photoshop Disasters” where they feature the constant flow of overly modified images.
The mass media (and even governments) have been repeatedly bit by overly modified images — modified to the point of being fictional. Times had their OJ Simpson cover controversy, while Newsweek had Martha Stewart’s head. China had their fake tiger and antelopes. Iran had doctored missiles.
Occasionally the photographers get caught and end up ruining their careers.
Modified is a relative term. I almost never see real photos that are unedited and actually in the format originally provided by the camera. Instead, minor touch-ups are the norm. There is a fine line between an acceptable enhancement and a fictional alteration.
If you see a photo on the web that is amazing, super funny, or extreme in some way — and it is not by a professional photographer — then the chances are really good that there has been some kind of manipulation. Sure, there are some phenomenal photographers who take rare ’super’ pictures, and a few regular people snap the once-in-a-lifetime awesome photo, but the majority of people are not that good.
Question 6: This one is slightly divergent from image analysis, but it is related and I wanted to ask you about it anyways. You wrote an article in April of 2007 called “Online Impersonations: No Validation Required” for the Security Focus site discussing how easy it was to set up a forged online persona using MySpace.
You also discussed some flaws with the process of reporting these issues to MySpace to have fake profiles removed. I’ll take the liberty of assuming that this postulation would extend to other Social Networking sites and ask if you have noticed any improvements in this process or have things stayed pretty much status quo in the “get that fake profile of me removed” department?
Actually, this is very related. Most social sites permit anyone to register an account, but only require authenticating when a complaint is received. MySpace wants you to send in a photo of yourself holding a handwritten sign.
I and a few associates discovered that photoshopped images were acceptable by MySpace.
Now, for clarity, we were requesting the removal of actual impersonating profiles. However, we made the requests using different types of photoshopped images. All were accepted by MySpace. We also found that if you claimed to be a teacher then a photo was not even required and fake profiles were removed much faster.
The problem is not limited to MySpace. Other sites are just as bad. Yahoo! wants a fax of your ID. A fax is such low quality that any forgery is acceptable. And Google does not even want to be involved; they want you to complain to the FBI.
Unfortunately, even after writing that paper last year and discussing the exploit at five conferences, none of these sites have made any effort to change.
The thing that regular users need to remember: these sites ask for information that they cannot validate. They cannot require a driver’s license, since not everyone drives. And in the United States, law enforcement cannot investigate without an open case.
Validating IDs through a service is not free. Yahoo! would go broke if they had to validate every government issued ID. And that’s assuming that the issuing government would permit validation. European and Asian countries would probably be very hesitant about validating their citizens with a foreign corporation. And some third-world countries have no IDs to validate.
All of this comes back to image analysis. If these sites require an image — photo or fax — for authentication, then they need some method for validating the image. As far as I know, they don’t have that right now.
Question 7: Getting back to the subject of images, can you give a high level over view of some of the more advanced analysis techniques that employ such as Error Levels and Wavelets (in layman’s terms if possible)? Also, can you point to any instances where these techniques have been used in court? (You can probably tell that I read through the online version of your presentation at the 2007 Black Hat conference posted on the Wired blog network).
I actually wrote a paper on each of the algorithms for Black Hat.
Error Level Analysis (ELA) uses the JPEG “lossy” quality. Every time you save a JPEG, the image gets a little worse. However, the quality loss is not linear. The first resave loses the most information. The second resave (load the saved image and then save it again) loses more information. ELA measures the amount of information loss.
Here’s how ELA works: Load an image into an editor, then save it one time as a JPEG. Finally, compare the two images — how much did each pixel change? If there is a lot of change between the images, then it means the pixels had a high error level potential and were likely original. If there is no change, then the image was already at its minimum error level — indicating prior resaves.
When someone edits a JPEG image, they alter the error level potential for the regions that they touch. Thus, if someone edits a JPEG, then the ELA analysis will identify the edited areas.
Wavelet analysis use signal decomposition to identify modifications. Basically, when you splice a head onto a body, the two photo fragments have different signal properties. The analysis identifies the areas where the properties differ.
As far as court cases go, I’m not aware of any cases where this kind of photo forensics has been used.
Question 8: Any general comments on EXIF data, its reliability and the best tools for getting at EXIF information from images posted on line?
There is a program called “exiftool“. It does a great job at extracting meta data from images, as well as other file formats.
For extracting other information, such as the JPEG quantization tables, I wrote a program called “jpegquality”. This extracts the tables and estimates the quality used for the last save.
Question 9: Can you give the readers some ideas of advanced forensic image analysis and point them to some additional resources on the subject should they want further reading?
Forensic image analysis is actually a very large field. First, readers should decide what area they are interested in. Currency, documents, identifying modifications, distinguishing real from computer graphics, etc.
Cynthia Baron’s book excels at covering the breadth of the problem space.
My primary research focus is on anti-anonymity technologies. With regards to image analysis, this means modification detection and distinguishing between real, enhanced, computer graphics.
I know of other researchers who have different specialties: pornography detection, steganography detection, image ballistics, and content identification. There are very few books on these topics. Most good sources are by academic researchers.
For more depth, consider some of the white papers by Dr. Hany Farid and Dr. Jessica Fridrich. Other researchers include Shih-Fu Chang (Columbia University), Nasir Memon (Polytechnic University), and Min Wu (University of Maryland). However, none of the research papers by these experts are “easy reading”.
As a field, digital image analysis is a relatively new field with only a handful of researchers.
If you are interested in this topic, in addition to Cynthia Baron’s book on Adobe Photoshop Forensics you will also want to check out a presentation she did for Northeastern University Library’s Meet the Author series posted on Youtube. She covers several examples in the presentation that she mentioned in the book (and a few more if I’m not mistaken). This of course is in addition to the many sources mentioned by Dr. Krawetz above.
I would like to thank Dr. Krawetz for his participation in this interview and the great insight he provided. Please visit his blog for more interesting examples.