|
||||||||||||
|
|
||||||||||||
Near Duplicate ProcessingGroup Similar Documents for an Effective Review Do you ever feel like you are seeing the same Excel file over and over again? In most document reviews, this is not uncommon. At Ignited, we have partnered with Equivio to bring our clients "near duplicate" technology. Near duplicates are simply files with minor differences, such as contract revisions containing a few different words. Equivio is able to identify and group these like-documents and provide the end-user additional database fields to sort and organize their collection. ![]() How does it work? By analyzing the text content of each record, Equivio first determines a "pivot" document or the file it deems the most representative in the near duplicate set. From here, other documents with similar content are identified and grouped with that pivot document. The derivation rate at which documents are deemed 'near duplicates' of one another can be pre-specified. Since this analysis is performed on the text content of a file, and not the file block level, Equivio can be run on scanned images just as well as an MS Office document. Why group near duplicates? This process brings new light to the document review process. Without Equivio, near duplicates would normally be dispersed randomly through the document collection. With Equivio, the near-duplicates are clustered into groups enabling a coherent, systematic review process. In most cases, 20-50% of documents are determined to be near duplicates.
For more information on near duplicate processing or other creative solutions offered by Ignited, please visit www.igniteddiscovery.com/creative. |
||||||||||||
|
Copyright © 2007-2008 Ignited Solutions, LLC. All Rights Reserved. Site Map | Disclaimer | Privacy Policy | |