Revolutionizing Healthcare With Multimodal AI
Humans rely on various data sources to make decisions, including the information we receive through sight, taste, touch, hearing, and smell. By combining the data we receive through these inputs, we can make complex decisions.
But imagine trying to make sense of our environment using just one of those data sources.
That’s the case with most AI and machine learning (ML) models, which use just one ML model trained on one type of data source to generate predictions and insights. Multimodal AI – which uses various data types and models for improved accuracy – is different.
Let’s dive into what multimodal AI is, how it differs from other types of AI, and its implications in the healthcare industry.
What is Multimodal AI?
There are two types of multimodal AI: Multimodal learning and combining models. Here’s a breakdown of each:
- Multimodal learning combines information from more than one data source or type – such as text, images, audio, and video – to provide a richer, more comprehensive, and more effective learning model. Multimodal learning has applications in healthcare, autonomous cars, speech recognition, emotion recognition, and other areas.
- Combining models involves bringing together more than one ML model to improve overall model performance. All ML models have strengths and weaknesses that can be overcome by combining models, thus improving accuracy.
In this post, we’ll focus primarily on multimodal learning, which fuses disparate data and uses unique unimodal neural networks on each input type, such as convolutional neural networks for images and recurrent neural networks for text.
These models extract features from the data using unimodal encoders to process each data type individually, then use a fusion network (using techniques such as cross-modal interactions or concatenation) to integrate these features into a unified representation. A classifier then uses the unified representation to make task-specific predictions, classifications, or decisions.
Why is Multimodal Learning Important?
The main argument for multimodal AI is the lack of data heterogeneity in healthcare, which has long been an obstacle in AI and ML applications, along with a desire to create more accurate models.
At the same time, multimodal learning has become possible thanks to the increasing availability of disparate biomedical data such as electronic health records (EHRs), medical images, data from large biobanks (such as the U.K. Biobank, the U.S. Million Veteran Program, Biobank Japan, and the China Kadoorie Biobank).
Multimodal AI has several distinct benefits, including improved accuracy, better problem-solving capabilities, and increased ability to handle more complex tasks. Multimodal models are also more versatile since they can handle a wider variety of data and more robust since they aren’t reliant on just one data type (as is the case in unimodal models).
What Are Some Multimodal Techniques?
There are several different techniques inherent in both multimodal learning and combining models.
Multimodal learning techniques include:
- Fusion-based approach: Encodes various data types into a common representation
- Alignment-based approach: Aligns the data types to enable a direct comparison
- Late fusion: Combines the predictions of models separately trained on each data type
Combining model techniques include:
- Ensemble models: Combine the outputs of multiple base models into one overall model
- Stacking: Uses the outputs of multiple models as inputs to another model
- Bagging: Averages the predictions of several base models trained on different data
How Are Multimodal Models Used?
Multimodal AI has already shown itself capable in a number of applications, including:
- Internet of things (IoT) and smart cities
- Image, speech, and pattern recognition
- Cybersecurity
- Natural language processing (NLP)
- E-commerce
- Robotics
- Education
- Sustainable agriculture
But one of the most exciting fields for multimodal AI is that of healthcare, where it can integrate data from multiple sources to create a more accurate diagnosis. Take medical imaging, for example, where a multimodal AI system can integrate data from various image types (MRI, CT, PET) to improve the accuracy of diagnosis and proposed treatment.
A March 2023 study indicated that there are approximately 130 applications of multimodal AI in healthcare, with the most prevalent areas being cancer and neurology. The technology has shown promise in several areas of healthcare, including:
- Cardiovascular
- Gastrointestinal
- Pediatric
- Respiratory
- Musculoskeletal
- Urogenital
- Psychiatric
- Ocular
- Endocrine
- Nephrology
- Autoimmune
- Infectious diseases
Other potential applications of multimodal AI in healthcare include the development of personalized “omics” for precision health, digital clinical trials, remote monitoring, and pandemic surveillance and outbreak detection.
Multimodal models have shown great promise in several areas of healthcare, including diagnosing and treating cardiovascular diseases. In this study, researchers developed a multimodal data fusion AI model using a convolutional neural network that could predict hypertension with an accuracy of around 94 percent.
This study used a multimodal data fusion model to predict hospital readmission rates of those who had suffered heart failure, achieving an accuracy of more than 75 percent.
And these researchers developed a multimodal large-scale model framework called Stone Needle, which integrates an array of data sources such as text, video, audio, and images and can be tailored to perform specific healthcare tasks.
“The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care,” the authors write, adding that the model consistently outperformed other methods such as GPT-4, LLaMA-7B, Visual ChatGPT, and LLaVA. “By effectively integrating multiple modalities and specifically addressing the needs of healthcare applications, Stone Needle can provide healthcare professionals with valuable insights and improve patient care.”