Reinforcement Learning in Health Care: Why It’s Important and How It Can Help.

At a TED Talk back in 2010, game designer and author Jane McGonigal argued that video games would help change the world for the better. While she may not have been referring to health and wellness specifically, recent developments in reinforcement learning (RL) for health care have rapidly turned parts of McGonigal’s vision into reality.

In many ways, RL isn’t much different from other types of machine learning (ML), including deep learning or classic ML techniques. RL is simply a narrower subset of ML – “the cherry on the cake” of artificial intelligence (AI), according to Facebook VP and Chief AI Scientist Yann LeCun. The main difference is that instead of merely inspecting data, RL agents learn by interacting with their environments and earning rewards or penalties based on their actions. 

It’s a gamified approach to machine learning, one that governs RL’s two main characters: The agent and its environment. The agent interacts with this environment, which can change either through the agent’s actions or on its own. The all-encompassing goal of every RL agent is to improve the state of its environment, thus maximizing its rewards through a series of decisions and actions. It’s all accomplished via trial-and-error with no prompting or outside interference by the programmer (other than initially setting up the algorithm’s reward policy) – similar to someone playing a game. 

RL algorithms optimize long-term rewards, learning the best response sequence over time as agents receive feedback about the state of their environments. Given enough computing resources, RL agents can harness hundreds or even thousands of parallel “gameplays” to gather experience faster than humanly possible.

Why reinforcement learning for health care?

For the above reasons, RL is well suited for systems with inherent time delays, including those of autonomous vehicles, robotics, video games, financial and business management, and – yes – health care. “RL tackles sequential decision-making problems with sampled, evaluative and delayed feedback simultaneously,” according to researchers Yu et al., making its unique form of progressive decision-making an excellent candidate for health-care applications. RL is also flexible enough to consider the delayed effects of treatments and doesn’t need as much contextual data to make relatively informed decisions.

“RL is able to find optimal policies using only previous experiences, without requiring any prior knowledge about the mathematical model of the biological systems,” say the researchers. “This makes RL more appealing than many existing control-based approaches in health-care domains since it could be usually difficult or even impossible to build an accurate model for the complex human body system and the responses to administered treatments, due to nonlinear, varying and delayed interaction between treatments and human bodies.”

Machine Learning

Reinforcement learning in healthcare: Applications

While several health-care domains have begun experimenting with RL to some degree, the approach has seen its most notable successes in implementing dynamic treatment regimes (DTRs) for patients with long-term illnesses or conditions. It has also achieved a level of functionality in automated medical diagnosis, health resource scheduling and resource allocation, drug discovery and development, and health management.

Dynamic treatment regimes (DTRs)

RL’s most common real-world health care application is the creation and ongoing configuration of DTRs for patients with longer-term conditions. DTRs are sequences of rules governing health-care decisions – including treatment type, drug dosages, and appointment timing – tailored to an individual patient based on their medical history and conditions over time. Clinical observations and patient assessments provide the input data, with the algorithm outputting treatment options to provide the patient’s most desirable environmental state. RL is used to automate decision-making within these ongoing treatment regimes. It has already helped design DTRs for chronic diseases including cancer and HIV, and could also improve critical care using the rich data collected in intensive care units (ICUs).

Automated medical diagnosis

Medical diagnoses are essentially an exercise in mapping patient information (such as history and current symptoms) to the correct disease profile. While this may sound relatively simple, it can, in clinical terms, be an incredibly complex task representing an enormous burden (in both time and cognitive energy required) for busy clinicians. 

We already outlined the costs of mistaken diagnoses in a previous blog post: Along with being attributed to around 10 percent of U.S. patient deaths, misdiagnosed patients have been paid nearly $40B in compensation over the past 25 years. That’s why ML algorithms to improve diagnosis are so vital to the health-care industry and its patients. But RL techniques hold particular promise because most ML diagnosis solutions require large amounts of annotated data for training purposes. RL agents, by contrast, require smaller amounts of labeled data.

Health-care resource scheduling and allocation

The health-care system is more or less the same as any other service business, with patients being the customers and health-care resources being the service providers. Because of RL’s well-documented suitability to business process management (BPM), it can help hospitals and clinics manage day-to-day operations by analyzing and devising optimal resource allocation and human resource scheduling based on seasonal trends, current staffing and inpatient levels, and other data points. 

Drug discovery, design, and development

Traditional drug discovery has many flaws, with the most damaging being that its human-driven, trial-and-error process is too time- and cost-prohibitive. This is the case even when using modern techniques such as computer models and simulations (M&S) to analyze the behavior of molecules and atoms. Despite all that time and money, however, success rates are still relatively low, with slightly less than 10 percent of compounds entering Phase I trials. For these reasons, RL methods are increasingly being applied to de novo drug design to automate and improve drug design hypotheses and compound selection. 

Some drug developers are experimenting with advanced machine learning techniques combined with quantum computing (as explained in this CapeStart blog post) whose computing power can help researchers compare larger-scale molecules than is currently possible using classical computers.

Health management

RL has also been used to devise adaptive and personalized interventions for ongoing health management, including exercise and weight management regimes for obese or diabetic patients. AI has already proven to be a valuable tool for encouraging patient engagement and adherence to health management programs.

The challenges of reinforcement learning and healthcare  

Despite its nascent success, RL in health care still faces several significant yet surmountable challenges before it ever sees large-scale clinical implementation. Transferring an RL agent from a training or simulated environment to the real thing can be difficult, for one thing, and because the only feedback the agent can comprehend are rewards and penalties, updating or tweaking the algorithm can prove problematic. 

There are also instances of RL agents learning how to “game” their systems – including this OpenAI video where the agent figured out how to collect rewards without finishing the race.

But there are other challenges, including:

Data scarcity

Even though RL agents are at their best learning on the job (so to speak), deep learning researcher Isaac Godfried points out that in most cases, using real patients to train RL algorithms isn’t the most ethical approach. That means they must train in simulated environments on historical observational data of specific treatments, which is often difficult to obtain for various reasons, including HIPPA compliance and personal health information (PHI) considerations.  

Partial observability

While RL agents can often account for the full state of simulated environments, the human body is much more complex and fast-moving than even the most detailed of simulations or historical data sets. This partial observability in the form of blood pressure, temperature, and other readings means RL agents in clinical settings often don’t have a full understanding of the state of their environment.

Reward formulation and configuration

Although RL agents for health care are designed with long-term success in mind, that’s easier said than done when devising a reward policy that balances long term benefits with short-term (but sometimes deceiving) improvements. “For instance, periodic improvements in blood pressure may not cause improvements in outcome in the case of sepsis,” writes Godfried. “In contrast, having just one reward given at the end (survival or death) means a very long sequence without any intermediate feedback for the agent.”

Although many challenges remain before the use of reinforcement learning in healthcare is anywhere close to perfect, there has been recent progress on several fronts – especially in the more efficient and precise development of DTRs for chronic and other conditions. As reward policies become better refined and more data is made available to build environments for RL agents, no doubt we’ll see further improvements – and, ultimately, better health outcomes for patients and more efficient operations for health providers.

Contact Us.