Using NLP to Improve PICO Element Identification and Extraction for SLRs and Evidence-based Medicine.

Although the term “evidence-based medicine” (EBM) first appeared in print in the early 1990s, the history of this now-popular approach to clinical practice goes back much further. In the mid-18th century, James Lind, a Scottish naval physician, experimented with citrus-based scurvy treatments on several comparable groups of sick sailors. And there is evidence of what can be loosely called EBM stretching all the way back to the ancient Greeks.

The modern approach to EBM – which bases medical decisions on the evidence summarized in systematic literature reviews (SLRs), which themselves are based on analysis of randomly controlled trials (RCTs) of treatments of specific medical conditions – is still relatively new by comparison. But it’s rapidly evolving, especially as medical professionals confront data volumes that are growing exponentially: Health-care data is responsible for around 30 percent of the world’s data volume, according to RBC Capital Markets, and by 2025 will grow at an annual rate of 36 percent. 

This explosion in health-care data has in part led to the large-scale adoption of the PICO model for developing specific clinical questions from RCTs. PICO is a mnemonic that stands for:

  1. Population/problem: Addresses the characteristics of populations involved and the specific characteristics of the disease or disorder
  2. Intervention: Addresses the primary intervention (including treatments, procedures, or diagnostic tests) along with any risk factors
  3. Comparison: Compares the efficacy of any new interventions with the primary intervention 
  4. Outcome: Measures the results of the intervention, including improvements or side-effects

PICO helps evidence-based practitioners develop precise clinical questions and searchable keywords to answer those questions and, considering the above data volumes, is a critical tool. But it’s also often extremely time-consuming and requires a high level of technical skill and medical domain knowledge. As we pointed out in a previous blog, SLRs require large teams of experts and require an average of more than 1,000 hours to complete – and, in many cases, the development of a specific research question can comprise a significant chunk of these hours. Most PICO searches involve several steps, including question formation, keyword identification, search strategy development, search execution, and literature review. And that’s not even considering the growing amount of health-care data we mentioned earlier: According to Nye et al., around 100 RCT manuscripts were published every day in 2015, and that number has almost certainly grown since then.

Machine learning (ML) and natural language processing (NLP) can help facilitate the automatic identification of PICO elements from this vast sea of information. This helps evidence-based practitioners develop precise research questions faster and more accurately, speeding up the entire SLR (and EBM) process. 

The automation of PICO identification and extraction

Because the sheer volume of primary evidence available is becoming too challenging to wade through manually, researchers are experimenting with automating these tasks based on NLP techniques. Indeed, the rapidly-growing amount of health-care data means it’s “practically impossible for physicians to know which is the best medical intervention for a given patient group and condition.” Adding to the challenge is that RCT results are typically published as unstructured free text, not structured data easily analyzed or queried through SQL and other standard techniques.

It’s precisely this last challenge that makes NLP such a suitable tool for automating PICO element identification and extraction. “Methods to extract PICO elements for subsequent inspection could facilitate inclusion assessments for systematic reviews by allowing reviewers to rapidly judge relevance with respect to each PICO element,” writes Wallace et al. “Furthermore, automated PICO identification could expedite data extraction for systematic reviews, in which reviewers manually extract structured data to be reported and synthesized.”

But according to Kang et al., previous attempts using support vector machine (SVM) and conditional random field (CRF) have stalled partly due to “the lack of publicly available, annotated corpora” for training along with a dearth of available tools to perform named entity recognition (NER) and information retrieval (IR).

That’s changing, however, with recent rapid advancements in ML and NLP, including the advent of deep learning, neural networks, and the increasing availability of publicly available annotated corpora for training and evaluation. Kang et al. cite the biLSTM-CRF model as particularly effective at NER for PICO-related applications, adding that the emergence of transfer learning in NLP is helping address the “high demand of large data for training neural networks.”

Examples of NLP-automated PICO identification and extraction

Among these advancements, the release by Nye et al. of EBM-NLP in early 2018 – a corpus of 5,000 richly annotated medical article abstracts describing clinical RCTs – has been crucial. EBM-NLP is especially suitable for PICO element extraction because it takes into account trial population characteristics, interventions, comparators, and outcomes making access to relevant data much less of an issue than in the past (although issues around data availability to train and evaluate NLP models for PICO identification and extraction persist). Despite these lingering issues, significant progress has been made over the past several years by a number of researchers:

  • Bui et al. (2016) developed an NLP-based system that automatically summarizes full-text scientific articles and analyzes them for PICO values and SLR-related data elements. The model showed better recall (91.2% vs. 83.8%) and density of relevant sentences ( 59% precision vs. 39%) when compared to human written summaries.
  • Wallace et al. (2016) proposed a method of speeding up evidence synthesis of full-text articles for PICO using supervised distant supervision (SDS). This approach “learns to automatically extract sentences pertaining to PICO elements from full-text articles describing RCTs” using a large semi-structured corpus (the Cochrane Database of Systematic Reviews, or CDSR).
  • Kang et al. (2019) created an open-source PICO statement extraction tool to process RCTs using NER for PICO elements, Unified Medical Language System (UMLS) encoding, and XML outputs. Although using only a small dataset for training, it achieved “better performance than conventional machine learning models trained on a larger corpus,” demonstrating that it’s possible to develop NLP models for PICO applications without needing large amounts of training data.
  • Brockmeier et al. (2019) trained a NER model using Nye et al.’s publicly available corpus, implementing the model as a recurrent neural network (RNN) and applying it to medical abstracts to identify and extract PICO elements. “The occurrences of words tagged in the context of specific PICO contexts are used as additional features for a relevancy classification model,” the authors explained. “Simulations of the machine learning-assisted screening are used to evaluate the work saved by the relevancy model with and without the PICO features… Inclusion of PICO features improves the performance metric on 15 of the 20 collections, with substantial gains on certain systematic reviews.”
  • Jin et al. (2020) proposed a new deep learning model to recognize PICO elements based on bi-LSTM along with conditional random field architecture, but adding an additional bi-LSTM layer “so that the contextual information from surrounding sentences can be gathered to help infer the interpretation of the current one.” Instead of using large corpora, the researchers also proposed using adversarial training and unsupervised pre-training to prime the model. In testing on benchmark datasets, the model outperformed previous bests by between 5.5 percent and 7.9 percent. The code is available here.
  • Marshall et al. (2020) developed Trialstreamer, a system to automatically find and categorize RCTs. It’s grown into a publicly-available annotated database of more than 700,000 RCTs derived from PubMed and the World Health Organization International Clinical Trials Registry Platform. The system extracts free-text descriptions of PICO elements, mapping them to the standardized Medical Subject Headings (MeSH) thesaurus. In the first five months of 2020, the researchers write, the system was able to categorize and index an average of 142 RCTs per day.

The promise of NLP for PICO identification and extraction 

PICO is a crucial tool for evidence-based practitioners looking to evaluate the relevancy of RCTs to formulate specific and answerable research questions (and related keywords), but it can be time-consuming, prone to human error, and requires a great deal of process and medical expertise. It has also become increasingly difficult to manually search for and identify PICO elements considering the fast-growing amount of relevant health-care data being created.

The promise of NLP for PICO identification and extraction means these practitioners can achieve as good or better results when scanning the literature for PICO elements, but for far less manual labor. CapeStart’s machine learning engineers, data scientists, and subject matter experts can help your next systematic literature review and PICO process with a range of health care-focused NLP and data annotation solutions, from pre-trained NLP models and model development to pre-annotated datasets for model training. 

Contact Us.