Confidentiality and Document Metadata



Confidentiality and Document Metadata

4/2006

ARCS, M Ward.

Executive Summary – Electronic document formats can contain information, such as the author and creation date, that is not readily displayed or easily accessible to the average user. This metadata (data about data), can potentially be used to identify the authors of anonymous or confidential information.

Metadata

Metadata is literally “data about data.” Many programs, such as Microsoft Word, maintain information about electronically created documents within the document itself. This metadata can contain information like the author’s name, the name of the PC where the document was created, and the last time it was printed. It can also be used to track changes that occur to the document over time.

In situations where confidentiality and autonomy are required, this metadata can be potentially used to discover information about the origins and authors of electronic documents. For example,

“Back in February 2003, 10 Downing Street published a dossier on Iraq's security and intelligence organizations. This dossier was cited by Colin Powell in his address to the United Nations the same month. Dr. Glen Rangwala, a lecturer in politics at Cambridge University, quickly discovered that much of the material in the dossier was actually plagiarized from a U.S. researcher on Iraq.” (Blaire’s Iraq Dossier)

All computer users that deal with confidential information need to be aware of the existence of this metadata along with the potential risks it poses.

Microsoft Office Documents

Microsoft Office is the most widely used suite of applications for creating documents and spreadsheets. Given its popularity, users need to take care to remove all metadata when transferring documents that have had their contents “sanitized” but may still contain some discriminating metadata. Several programs exists for this purpose, DocScubber is one of them. Here is a screenshot of DocScrubber analyzing this document.

[pic]

DocScrubber can be used to remove metadata from Microsoft Word documents. A more comprehensive solution can be found in Metadata Assistant, which can be used to eliminate metadata from all Microsoft Office document formats.

The applications that comprise Microsoft Office can be configured to minimize the amount of metadata that is saved within a document. These configuration changes are detailed at and the link can be found below.

Portable Document Format (PDF)

Adobe’s PDF format is a very popular method of distributing information in an electronic format that will have an identical appearance no matter where it is being viewed. PDFs can also contain metadata such as the author and creation date. This is the metadata associated with a PDF from the NSA on OS X security.

[pic]

This PDF was created from a Microsoft Word document on the 15th of October in 2004.

The easiest way to remove metadata from a PDF is to “flatten” it by converting it into a TIFF image, but this is not always an option. Another option is to open the PDF using Adobe Acrobat (the full version, not Reader) and selecting Document Options from the File menu. When converting to PDF from another format, care must be taken to configure the converting application to not include “document information” from the original format.

Redaction and PDFs

PDFs are often used to disseminate information that has been declassified. Redaction is the method of removing sensitive information, most often by hiding it under “black bars.” This can be done incorrectly, such as the example of when an Italian agent was killed in Baghdad in 2005:

“After reviewing a copy of the report, it appears that the PDF document was produced directly from Microsoft Word using Adobe Acrobat 6.0's PDF Maker. In Word, it's possible to add shading behind text; if the shading is dark enough, it can appear to the user as if the text has been effectively obscured. While this should work for a physically printed version of the document, Word's shading option was never designed for redaction. As a result, the text can still be selected in the resulting PDF using the Select Text Tool in Adobe Reader.” (PlanetPDF)

Completely removing the sensitive information from the PDF, with applications such as Appligent Redaction, is a better solution.

Electronic Images

Electronic image formats, such as GIF and JPEG, also can contain metadata. For example, most modern electronic cameras save images in JPEG format and put metadata within the image containing information such as creation date and image resolution. While this default information about the specifics of the images does not seem to pose much of a risk, people working with the image may add other information to image’s metadata.

Members of the International Press Telecommunication Council (IPTC) often add geographic information to images used by the world press. Such as the case when the Washington Post was interviewing a “hacker” who wanted to remain anonymous.

“As a couple of other Slashdotters noted, these appear to have been entered by The Washington Post photographer. It's probably completely routine for them. After all, that's what IPTC (International Press Telecommunications Council) fields were designed for: to help periodicals manage their huge number of digital photographs.” (Image Metadata Reveals Hacker)

The IPTC data added to the JPEG images indicated the location where the pictures were taken, which could have lead to the identification of the “hacker.” Where necessary, this additional information can be omitted or removed by applications such as iTag.

Summary

Metadata poses a risk to the confidentiality of sensitive electronic documents and, where possible, this metadata should be removed using the appropriate tool. Authors and editors of these documents need to be careful to insure that such metadata is not created unintentionally.

Links

Tony Blaire’s Iraq Dossier:

DocScrubber:

Metadata Assistant:

Removing Metadata from documents created by Microsoft Products:

Italian Agent Killed in Baghdad:

Appligent Redaction:

Image Metadata “Reveals” Hacker Identity:

ITag:

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download