Blog Layout

Speeding Document Review With Analytics

Outsourced eDiscovery Partnerships Streamline the Document Review Process

Document review is one of the most expensive, time-consuming and important tasks in law. When faced with volumes of unstructured data to review in a very tight timeframe, lawyers often lean heavily on outsourced eDiscovery partnerships to provide knowledge and manpower to streamline the process. Typically, an outsourced eDiscovery partner will first seek to pull relevant documents using key word searches. Indeed, the vast majority of matters in the analysis phase of eDiscovery are easily handled via intelligent application of key word searches. However, there are times when busy lawyers must urgently get to the core of a huge data set to truly understand what needs to be reviewed and start reviewing.


Relativity, a long-time leader in the eDiscovery field, developed Relativity Analytics to incorporate visual data analysis and active learning technology to provide structural and conceptual searching functionality that works off ideas and concepts. Rather than matching specific key words or character strings as done by traditional searches, Relativity Analytics identifies critical documents in a case by searching and organizing them using a predetermined index to identify similar ideas or concepts within a document set. Results depend on how and where similar ideas and concepts intersect.


For reference, Relativity Analytics can be broken down into two subsets: structured analytics and conceptual analytics. Structured analytics operations analyze text to identify the similarities and differences between the documents in a set. After using structured data analytics to group documents, Relativity can run conceptual analytics to identify conceptual relationships present within them. For instance, a project manager or review team can identify which topics contain certain issues of interest, which contain similar concepts, and/or which contain various permutations of a given term.

Using Structured Analytics

Using structured analytics, project managers can quickly assess and organize a large, unfamiliar set of documents to shorten a review team’s review time, improve coding consistency, optimize batch set creation, and improve Analytics indexes. Common structured analytics tasks include email threading, textual near duplicate identification and language identification:


Email threading - email Threading greatly reduces the time and complexity of reviewing emails by gathering all forwards, replies, and reply-all messages together. Email threading identifies email relationships, and then extracts and normalizes email metadata. Email relationships identified by email threading include:


  • Email threads
  • People involved in an email conversation
  • Email attachments (if the Parent ID is provided along with the attachment item)
  • Duplicate emails


Textual near duplicate identification - While textual near duplicate identification is simple to understand, the implementation is very complex and relies on several optimizations so that results can be delivered in a reasonable amount of time. The following is a simplified explanation of this process:


  • The documents are sorted by size—from largest to smallest. This is the order in which they are processed. The most visible optimization and organizing notion is the principal document. The principal document is the largest document in a group and is the document that all others are compared to when determining whether they are near duplicates. If the current document is a close enough match to the principal document—as defined by the Minimum Similarity Percentage—it is placed in that group. If no current groups are matches, the current document becomes a new principal document. 
  • When the process is complete, only principal documents that have one or more near duplicates are shown in groups. Documents that have the Textual Near Duplicate Group field set to Empty or Numbers Only are also grouped together.


Language identification - Examines the extracted text of each document to determine the primary language and up to two secondary languages present. This allows you to see how many languages are present in the collection, and the percentages of each language by document. The project manager can then easily separate documents by language and batch out files to native speakers for review. The operation analyzes each document for the following qualities to determine whether it contains a known language:


  • Character set (e.g., Thai and Greek are particularly distinctive)
  • Letters and the presence or absence of accent marks
  • Spelling of words (e.g., words that end in “-ing” are likely English)

Using Conceptual Analytics

Using conceptual analytics helps organize and assess the semantic content of large, diverse and/or unknown sets of documents. Unlike structured analytics, which relies on the specific structure of the content, conceptual analytics focuses on related concepts within documents, even if they don’t share the same key terms and phrases. Common features of conceptual analytics are clustering and active learning, which can cut down on review time by more quickly assessing your document set.


Clustering - Analytics uses clustering to create groups of conceptually similar documents. With clusters, project managers can identify conceptual groups in a workspace or subset of documents using an existing Analytics index. Unlike categorization, clustering doesn’t require much user input. Clusters can be created based on selected documents without requiring example documents or category definitions.


  • When documents are submitted for clustering, the Analytics engine determines the positions of the documents in the conceptual index. Depending on the conceptual similarity, the index identifies the most logical groupings of documents and places them into clusters. Once the Analytics engine creates these clusters, it runs a naming algorithm to label each node in the hierarchy appropriately to refer to the conceptual content of the clustered documents.
  • Clustering is useful when working with unfamiliar data sets. However, because clustering is unsupervised, the Analytics engine doesn’t indicate which concepts are of particular interest. A savvy project manager will then use investigative features such as Sampling, Searching or Pivot in order to find the clusters that are of most interest.


Active Learning - Active Learning is an application that runs continuously updated cycles of documents for review, based on review strategy. The advantages of Active Learning include real-time intelligence, efficiency, flexibility and integration with all the power of the Relativity platform.


  • Like traditional Technology Assisted Review (TAR), a reviewer codes documents, creating training examples which are then used to predict the remaining non-coded documents in the set. For sophisticated tasks, like eDiscovery document review, coding an optimal amount of training data may be expensive and time consuming.
  • With an active learning workflow, rather than training samples being created once then left to run until presumed relevant documents are obtained, this workflow continuously learns what reviewers tag as relevant and refines its algorithm to produce more correct document batches to reviewers making the process more efficient and resulting in time and cost savings.

At IST Discover-E, we have years of experience helping our clients with their eDiscovery needs along with full scale legal support management systems. We are experts in creating and customizing eDiscovery processes that best fit our client’s needs and expectations. Our model is uniquely transparent, easy to understand and effective in aiding our clients get the decision they want for their clients.

Download the PDF
Share by: