
EDiscovery and Native
Will this be the year EDiscovery finally goes all native?
I’m writing this in 2016, and you may be reading this in 2017, or 2018, but the question will likely remain, is this the year that productions will no longer be TIFF + load, but native?
Yes, we have moved from all paper to all digital productions. Yes, in many litigations, Excel Spreadsheets and PowerPoint Presentations are produced as native files. Yes, many databases are produced now as exports. Yes, audio and video files are not converted to TIFFs for production. But, the majority of most productions still are non-native forms.
Will this year see native productions of Emails? Microsoft Word documents? Web sites?
First, Why Does It Matter
Full disclosure, as any true technologist should be, I am 100% for native productions, warts and all.
Forensically speaking, the native data is the original copy with all of the binary information intact. It contains metadata not present in extractions. Many advanced analyses require the full native data. TIFF’ing plus extracted text or OCR’d text destroys significant functionality. For specialized files (ex: SAS statistical sets or underlying web site logs and content or a foreign or mixed language document), TIFF becomes as unusable as it does for an Excel Spreadsheet. For other specific files, such as Outlook Email Forms, the receiving party has no means of determining the importance of missing metadata nor how to request it in the first place.
There are significant challenges in getting to 100% native. Let’s lay them out.
Defining the Problem
Here are some of the more cited challenges in 100% native productions:
- rendering
- consistent rendering
- redaction
- decompression
- access security
- authentication
Rendering
Even common file types can pose a rendering challenge. For example, a Microsoft Word document might be in a newer or older version of Microsoft Word to which the receiving party does not have access.
Other common, but specialized files types, such as a desktop publishing document (Quark, Adobe Illustrator, etc.), might be difficult for a receiving party to open.
If the Receiving Party is using a vendor to host the production, the vendor can provide the Producing Party with a list of file types that the vendor can natively support. The Producing Party can communicate to the vendor with respect to any types that do not match that list and determine whether the production of those types should be made natively or as an image.
Consistent Rendering
This becomes a reference issue. For example, if I print Portrait and you print Landscape with a smaller scale, then we may have different page counts and the same content may appear on different pages.
Indeed, Bates numbers and TIFF’ing is very good at keeping page numbers consistent such that citing a Bates number will direct you to the right place.
But, there are two clear solutions: (1) a simple agreement about the scale, orientation and paper size for common documents and (2) an agreement that any document used as an exhibit whether at trial or in a deposition be printed with Bates numbers.
Redaction
Native redaction can indeed be challenging notably because the redaction tools for TIFF are mature and well developed.
There are also concerns that the effect of a native redaction will change the metadata associated with a document (for example the date last saved).
Certainly certain types of documents (ex: video, audio) are very challenging (imagine you need to redact a background conversation that was privileged, but keep the main speaker’s presentation).
Even making sure that supervisors can check for improper redactions data can be challenging if they cannot quickly peek under the redaction.
I hear many arguments related to the increased burden of conducting a native redaction for all of the above reasons.
Honestly, most of the issues are due to the relative immaturity of native redaction tools.
As native redaction with Excel has become more ubiquitous, and as native is more frequently used as the production format, the tools will mature.
Ultimately, native redaction will offer significant speed benefits. Heuristics, regular expression patterns and simple replaces will be able to flag and auto-redact large quantities of data in bulk, which will more than compensate for the additional learning curves.
Decompression
Zipped and TAR’d files may be password protected. Some compressions may require specialized software to decompress.
I suggest treating these files akin to protected hard drives. Uncompress and produce the results. If those files have compressed files within them (not uncommon), decompress those as well.
Access Security
Some referenced files may live on other servers that are behind protected firewalls. For example, an email may contain a link to data that was stored on a shared server.
For these references, I would suggest the producing party produce those linked files (in native of course), and do a soft (symbolic) link to allow for cross linking using the original path.
Authentication
Finally, there is always a concern that while working with the native file, the receiving party with inadvertently change the metadata and / or content.
With hosted vendors, that’s not really a concern. With the speed of checking a hash on a document, documents can be authenticated in real time at depositions and at trial. Even for very large documents, it takes no more than a few seconds to generate a hash using a standard computer. I would think the risk of getting caught using an altered document would be so significant that attorneys would be fairly careful to keep a document unaltered. There also could be a pre-existing agreement that if an altered document was used at a deposition that all subsequent answers in that line of questioning would be invalid.
So is this the Year EDiscovery Goes Native?
What do you think? Please share your thoughts.
about me
Since 2000, I have been helping attorneys navigate rapidly changing technology landscapes supporting the needs of litigation from negotiating protocols for ESI production format to performing forensic gap analysis and analysis on esoteric database and system productions.
If you need assistance with the technological challenges surrounding EDiscovery, please reach out directly to jjaffe@its-your-internet.com.
(c) Copyright 2016. All Rights Reserved. Jonathan Jaffe, Founder,
www.its-your-internet.com.