| e-Discovery Processing | Electronic Discovery Data Culling Capabilities
Filtering • Deduplication • Date Filtering • Searching
There are various techniques used by case teams and electronic discovery providers to cull down a dataset prior to attorney review, all of which have the same goal: to reduce the costs associated with collection, processing, review and production of electronically stored information (ESI). Effective data culling can reduce downstream costs by segregating irrelevant documents and enabling reviewers to only review potentially relevant information. Data culling begins at the earliest stages of discovery by identifying relevant sources of information, key custodians, critical dates and a plan for managing the eDiscovery process. The Oliver Group not only provides early stage consulting regarding how to mange the identification, collection and analysis of ESI, but TOG also provides data culling services using industry standard tools and techniques. TOG’s data culling services include removal of system files, deduplication, date filtering and keyword searching. System file filtering removes extraneous files that were not created by users; files used by applications and systems in order to run. Data deduplication segregates all exact duplicates, which are essentially files with the same digital fingerprint. When combined, date filtering, metadata filtering, deduplication and keyword searching techniques save significant cost and time while providing immense value to TOG’s clients. System File Filtering
This process is often the first step in reducing the aggregate data volume to only the most relevant and responsive data set for delivery to our clients.
System file filtering utilizes the National Software Reference Library (NSRL) to reduce the dataset by removing system files, application files and other files prepackaged with software. The NSRL is a database of known file hash values created by the National Institute of Standards and Technology, along with other outside sources. When working with restored or extracted data, The Oliver Group can generate a hash value for each file and then compare these values to the known values in the NSRL library. Files that have values matching those in the library can be filtered out as non-responsive. For chain of custody purposes, TOG can report on all files identified as a match to the NSRL list of known system files. Deduplication
The Oliver Group offers deduplication of both email and e-file (file server) data. The basic premise behind all deduplication is that a hash value (typically MD5 and/or SHA-1) is created for each email message or e-file as it’s processed or indexed. These values are stored in a database and then all subsequent data is compared against them. Any messages or files with identical hash values are considered to be duplicates and are discarded, leaving only a single instance of each message or file in the delivered dataset.
Deduplication can be focused on a single custodian’s data or across an entire enterprise data set. The method and breadth of deduplication is project specific and must be considered in the context of the ultimate goal of the project. Searching
Searching can be accommodated on both email and e-file data. Depending on the tool used, data (metadata and content) may be indexed in advance of searching. Indexing provides much faster searching once complete and greatly enhances the ability to re-search the data with new terms and parameters. In some cases, data is not indexed in advance and the searches are run against the data each time.
Searching can be accomplished using many different parameters, which may be used alone or in combination with one another:
Metadata & Date Filtering
Other culling options for email include sender/recipient filtering, where the metadata fields To/From/Cc/Bcc can be searched for names of interest, email addresses, or even domain names. TOG can perform sender/recipient filtering in conjunction with keyword searching that is run across Subject, email body, and attachments to further narrow the documents identified for further review. Date filtering is another common technique used in conjunction with keyword searching to reduce the volume of data to be reviewed. Date filtering of e-files can be applied to Date Last Modified and the criteria can be before, after, or between specified dates. Date filtering of emails can be applied before/after/between date(s) sent/received |
