IST Discover-E White Paper:

Active Learning: the Next Evolution of TAR

At IST Discover-E, we have years of experience helping our clients with their eDiscovery needs along with full scale legal support management systems.  We are expert in creating and customizing eDiscovery processes that best fit our client’s needs and expectations. Our model is uniquely transparent, easy to understand and effective in aiding our clients get the decision they want for their clients.

Technology-assisted review (TAR) software helps you find your most relevant documents faster by training the system on what data are important to a specific matter. While lawyers may worry that computer-assisted review of electronically stored information (ESI) will fail to retrieve appropriately relevant documents, studies comparing manual review and TAR show TAR is superior, provided the underlying algorithm has been properly coded and trained.

To mitigate that risk, there has been a lot of industry discussion on the topic of active learning.  While there are many different labels for this technology, there seems to be universal recognition that the active learning workflow can help get the most relevant documents in reviewers’ hands faster.

The active learning workflow in Relativity Assisted Review learns what’s important in real time, continuously refining its understanding of what’s responsive, getting smarter as the review progresses. Because the system keeps a pulse on coding decisions in real time and constantly refines its understanding of what’s responsive, reviewers can find the most important documents faster. And administrators have the ability to monitor the relevance rate and determine when the project is complete with results that are validated with transparent, defensible statistics.

How it Works

Like traditional TAR, a reviewer codes documents, creating training examples which are then used to predict the remaining non-coded documents in the set. For sophisticated tasks, like eDiscovery document review, coding an optimal amount of training data may be expensive and time consuming.

With an active learning workflow, rather than training samples being created once then left to run until presumed relevant documents are obtained, this workflow continuously learns what reviewers tag as relevant and refines its algorithm to produce more correct document batches to reviewers making the process more efficient and resulting in time and cost savings.

Active learning applies a queue-based system that generates continuous updates to the model’s training samples while providing a steady stream of highly relevant documents for review.  Coding decisions are sent to the model, the model updates, and the latest set of prioritized documents are made available for review.  This process repeats until a desired outcome is achieved.

Checks in Place for Outliers

In active learning, a recommendation algorithm uses the Support vector machine (SVM), a binary machine learning at the core of Relativity Assisted Review’s active learning workflow, classification to retrieve specific data to be reviewed by people familiar with the data. The reviewed data are sent back to the classifier and the recommendation algorithm suggests new data to be reviewed.

 

 

 

 

 

 

 

 

 

 

 

Documents are selected that have the highest confidence of being relevant after each model build.  Along with the highest ranked available documents, the active learning queue will insert documents that allow the model to learn from all possible definitions of relevance. To achieve this, a small number of non-highly ranked documents, or uncertain documents, are inserted into the mix with other highly ranked documents.

This system safeguards against incorrectly coded documents and/or weak reviewers.  Following best practices for review—and keeping a consistent definition of relevance—is still the most important factor, but because active learning continuously compares document tags and refines its algorithm, it is very unlikely for an incorrectly coded document to negatively influence the project outcome.

In Process Relevance Ratings

Administrators can use the relevance rate report to monitor active learning projects. The project updates relevance rate metrics every 200 documents. The relevance rate measures the prioritized review queue’s effectiveness in serving up relevant documents. It is a calculation of precision defined as the percentage of categorized relevant documents that are coded as relevant by reviewers. For example, if the model categorizes 10 relevant documents and human reviewers code eight as relevant, the relevance rate is 80 percent.

This active learning workflow also allows the number of manually selected documents submitted at the beginning of a project to be only a few or no documents at all. While it will certainly take longer for the algorithm to produce documents of higher relevance based on reviewer tags, the prioritized review queue continuously learns from reviewer coding decisions causing the relevance rate to rise. However, if reviewers code similar documents inconsistently, the model will struggle to find relevant documents and the relevance rate will stagnate.

Project Validation

While technology-assisted review allows for many methods of validating the project’s completion, active learning employs a different approach: elusion. The elusion test is an estimate of the number of relevant documents that the model missed. To determine this estimate, a statistical sample of categorized non-relevant documents is coded to find out how many relevant documents appear. If the elusion is acceptably low, the administrator justifiably ends the project. If the elusion is too high, the administrator resumes review for model improvement.

IST Discover-E Project Managers are on the cutting edge of Relativity’s active learning workflow to help our customers quickly power through the review of large amounts of data.  Research comparing active learning with traditional TAR shows active learning achieves better efficiency and effectiveness.  Comparisons show a review team would have to look at substantially more documents using protocols other than active learning.  In one example, 50,000 more documents would need to be manually reviewed.  Assuming the cost of review is $1 a document, active learning would provide $50,000 in savings.

 

Borders, Josh.  “White Paper: Active Learning in Technology-assisted Review.”  Relativity, 31 January 2018.

PREVIOUS

NEXT

Talent Acquisition Team

Innovative • Service • Technology • Passion

Contact