In this article we’ll examine the use of AI and deep learning technologies to make a predictive analysis of hospitalised patients.
The npi- Digital Medicine journal published an interesting research “Scalable and accurate deep learning with electronic health records” of Alvin Rajkomar MD, Research Scientist, Eyal Oren PhD, Product Manager at Google and many other researchers of the Medicine University of Stanford and Chicago.
The research aim was to use deep learning models to make a broad set of predictions relevant to hospitalized patients using de-identified electronic health records.
Electronic health records (EHRs) are tremendously complicated. Even a temperature measurement has a different meaning depending on if it’s taken under the tongue, through your eardrum, or on your forehead. Moreover, each health system customizes its EHR system, making the data collected at one hospital look different than data on a similar patient receiving similar care at another hospital. Before we could even apply machine learning, we needed a consistent way to represent patient records, which we built on top of the open Fast Healthcare Interoperability Resources (FHIR) standard.
Once in a consistent format, we did not have to manually select or harmonize the variables to use. Instead, for each prediction, a deep learning model reads all the data-points from earliest to most recent and then learns which data helps predict the outcome. Since there are thousands of data points involved (46 billion data of over 216,000 patients), researchers had to develop some new types of deep learning modeling approaches based on recurrent neural networks (RNNs) and feedforward networks.

Data in a patient’s record is represented as a timeline. For illustrative purposes, various types of clinical data (e.g. encounters, lab tests) are displayed by row. Each piece of data, indicated as a little grey dot, is stored in FHIR, an open data standard that can be used by any healthcare institution. A deep learning model analysed a patient’s chart by reading the timeline from left to right, from the beginning of a chart to the current hospitalisation and used this data to make different types of predictions.
After designing the scalable IT system to make predictions, researches payed attention to calculate their precision. The most common way to assess accuracy is by a measure called the area-under-the-receiver-operator curve, which measures how well a model distinguishes between a patient who will have a particular future outcome compared to one who will not. In this metric, 1.00 is perfect, and 0.50 is no better than random chance, so higher numbers mean the model is more accurate.
By this measure, the models reported in the paper scored 0.86 in predicting if patients will stay long in the hospital (traditional logistic regression scored 0.76); they scored 0.95 in predicting inpatient mortality (traditional methods were 0.86), and they scored 0.77 in predicting unexpected readmissions after patients are discharged (traditional methods were 0.70). These gains were statistically significant.
The researchers also used these models to identify the conditions under which patients were treated. For example, if a doctor prescribed ceftriaxone and doxycycline for a patient with high fever and coughs, the model could infer that the patient was being treated for pneumonia.
An important focus of researchers’ work included the interpretability of the deep learning models used. An “attention map” of each prediction shows the important data points considered by the models as they make that prediction. We show an example as a proof-of-concept and see this as an important part of what makes predictions useful for clinicians.

A deep learning model was used to render a prediction 24 hours after a patient was admitted to the hospital. The timeline (top of figure) contains months of historical data and the most recent data is shown enlarged in the middle. The model “attended” to information highlighted in red that was in the patient’s chart to “explain” its prediction. In this case-study, the model highlighted pieces of information that make sense clinically.
Researchers concluded that the results of this work are early and on retrospective data only. Indeed, this paper represents just the beginning of the work that is needed to test the hypothesis that machine learning can be used to make healthcare better.
Google patented those technics and models that researchers used in the study. This example makes evident why implementing an EHR on FHIR can represent a great opportunity for the health system, a way to strongly evolve the present architecture.
To be continued