Both Microsoft Office and Adobe Acrobat formats allow their documents to act as containers and embed other document types easily. As a result, the easiest way to get information such as a graph from one document to another is by simply dragging it from a source document and dropping it into a target. While this may seem like a straight-forward way to display information, it actually creates a separate copy of the entire source file and embeds it into the target, potentially exposing more information than intended. The practice of extracting these embedded objects is not common, but the potential for revealing hidden and discoverable information is huge.
Below is an example of how PowerPoint documents can contain entire Excel workbooks that are commonly overlooked.
How do we extract these files?
Our in-house utility recursively processes native documents (e-docs and attachments) and extracts any and all embedded objects, whether they were stored via OLE or other method. Document relationships are maintained by treating extracted documents as children to their original parent document.
By working recursively, embedded objects may also have embedded objects extracted, supporting an infinite number of parent-child relationships (see diagram to the right).
|
|
|