Intel On The Cheap: FOCA, EXIF and the Dreaded Metadata

Posted on 2 May 2011 by


Intel on the Cheap

One wonderful way to get intelligence about the genesis or even the source of a document you’ve got your hands on is to have a look at the file’s metadata. In one of our first PLI Podcasts, Eric Olson from Cyveillance discussed the value of metadata – data about data or information about information – and what exactly that is.

Metadata is information about the content in addition to the content itself. Olson’s example was that, if I’m standing in a bookstore and I’m an analyst with a report due tomorrow, I’m wondering which of these books contain the content I need, and there may be eleven factors I’m considering: does the book fit in my briefcase? Do I know the author? Are they a trusted source? Is this thing too heavy to hold up while I’m in the bathtub, where I do my best thinking?

These are factors about the book that have nothing to do with the content in the book. So for each datum, there’s the message or content of the datum itself, and there’s a whole range of information around it that is about it, but not that which is in it. These are the metadata.

For example, translating it to law enforcement, for a given datum, we can look at the observer or officer who reported it, the history of that person if it is known, the corner they’re were standing on when they got the information, that person’s social/familial/other relationships to or about the datum, etc.

If you haven’t read or listened to that podcast I would highly recommend that you do.

In any event, metadata is also available about documents, and is intentionally enriched by the technology vendors – such as Microsoft – which know to search metadata fields for relevance. So whenever you create a Word or Excel or PowerPoint or most other kinds of documents, metadata is created.

If they’re not careful, metadata can get out of the organization and bite them in the butt. It can allow people outside the author’s organization to learn about how the author’s organization works.

And it can be used by intelligence folks to discover the true chain, for example, of creation, editing and finalizing of a document.

This cuts both ways! If you’ve created a document inside your department which you purport to be from someone or somewhere else, for example, metadata can bite you in your butt, too.

As Microsoft says,

Legal professionals are familiar with the concept of “discovery” and the requirements set out by the courts for complying with discovery demands. They also understand that they are only required to provide the documents and data set out in the discovery demand. Unfortunately, if you are providing electronic versions of your documents, you may “discover” that you are inadvertently supplying more information than you realize…

These can include the name of your computer, the name of the network server or hard disk where you saved the document, other file properties and summary information, non-visible portions of embedded OLE objects, the names of previous document authors, document revisions, document versions… and comments.

Their page on metadata includes instructions on how to clean it, including Word documents, Excel spreadsheets, and PowerPoint presentations.

Now a free tool is available from Informatica for download or online use to help you manage metadata in a range of document formats, from all those above plus OpenOffice, Adpobe Acrobat, vector-based graphics and JPEG files. You can download FOCA here or just use the online version, both free.

Speaking of JPEGs, a great free Linux tool is exif – a command line tool that helps discover Exchangeable Image File Format tags which can reveal boatloads of information about photos you take. It’s probably already on your Linux box. One particularly awesome use of EXIF data is the stolen camera finder – drag a photo taken by the camera onto this free website and it will search the web for photos taken by the same camera, and get you user information about the person who posted the image.

While you’re at it, you might consider drafting some policies about what kinds of metadata you permit to be in your documents, and how you will check to ensure that your people – and you – are complying with the policy.