The Merits of Machine Learning and Natural Language Processing for Bias Evaluation in Systematic Literature Reviews.

Systematic literature reviews (SLRs) form part of the bedrock of evidence-based medicine (EBD), which combines objective evaluations of the most current evidence, clinician experience, and patient specifics to determine the most effective medical treatments or interventions.

This evidence, however, is usually found within unstructured clinical literature – a challenge in and of itself, but one compounded by a massive increase in literature volume over the past few years. According to Nature, health and medical submissions to publisher Elsevier were up a whopping 92 percent, year over year, between February and May 2020 – and data volumes were already going up before last year.

The unstructured nature of the data, combined with increasing data volumes, makes traditional evaluation of bias in medical literature an ever more time-consuming process: Most bias assessments take around 20 minutes per article. Bias assessment is performed by two independent reviewers, adding even more time to the process, and most reviews encompass hundreds if not thousands of articles.

Over the past few years, researchers have deployed machine learning (ML) and natural language processing (NLP) models to identify randomized controlled trials (RCTs) and determine bias in medical literature. The result has improved evaluation efficiency – a massive bonus for researchers already stretched thin due to other time-consuming SLR steps.

ML and NLP improves detection of bias in clinical evidence

Several recent studies have shown how ML and NLP models can be indispensable tools for human reviewers determining bias among hundreds of complex, densely-written, and unstructured scientific articles:

  • 1. Soboczenski et al. (2019) used an open-source ML system (RobotReviewer, or RR for short) to semi-automate bias assessments and evaluate the time saved over a manual approach. The Cochrane Risk of Bias tool was applied to four randomly selected RCT articles to determine the potential for bias in each article, along with highlighting any relevant text supporting their decision.
  • Two of these articles were evaluated using RR to suggest bias assessments and provide supporting text highlights. Not only was this semi-automated approach faster than manual methods, reviewers also accepted more than 90 percent of RR’s Risk of Bias (RoB) judgments and supporting text highlights. The suggestions made by the ML system had a recall of 0.90 and precision of 0.87 – leading the researchers to conclude that semi-automation using ML “can improve the efficiency of evidence synthesis.”
  • 2. Wang et al. (2021) performed a similar assessment by applying several different model types, including baselines (such as support vector machine and random forest), neural models (such as hierarchical neural networks), and BERT models (including sentence extraction), to nearly 8,000 full-text documents. The researchers determined that according to F-1 scores, neural and BERT models significantly outperformed regular expressions. Both model types performed better for different applications (neural models were most effective at identifying conflicts of interest, for example).
  • 3. A few years before these recent studies, Millard et al. (2015) used text mining techniques and supervised learning to automate risk-of-bias assessments in clinical documents. The researchers identified relevant sentences to train two different models: The first to predict article and sentence relevancy, and the second to apply a risk-of-bias value for each article using logistic regression. The researchers determined that articles were indeed successfully ranked for risk-of-bias utilizing this approach. Around one-third of ML-assisted article assessments only required one reviewer (as opposed to the usual two).
  • That said, humans aren’t the only entity capable of bias. ML and NLP models can also contain inherent biases, and an entire subfield – dubbed “model fairness” – has sprung up to address this related issue. 

CapeStart’s machine learning models and bias detection

CapeStart’s integrated, experienced team of machine learning engineers, data scientists, and medical subject matter experts help organizations produce cutting-edge SLRs in half the typically allotted time – ensuring all that hard work isn’t out of date by the time you publish your systematic review. We can deploy custom or pre-trained ML and NLP models to help conduct complex SLRs faster and more accurately. Our advanced neural, BERT, and other ML models enable research organizations to glide through mountains of scientific data, semi-automating the SLR process, including risk-of-bias evaluation.

To learn more about how CapeStart’s machine learning capabilities can improve the SLR process at your organization, contact us today and get started on your free trial.

Contact Us.