Skip to main content

Artificial Intelligence and Deep Learning Lab

The Machine Learning (ML) group in the Steele Institute for Health Innovation is an interdisciplinary team that unites clinicians with engineers and computer scientists, going beyond Big Data to solve some of healthcare’s most pressing issues and improve outcomes for our patients.

Who we are

Here at Geisinger, we’ve introduced artificial intelligence, commonly called machine learning, using technology to process data without pre-determined rules into our regular clinical workflow.

Our Artificial Intelligence and Deep Learning Lab is focused on the smart use of machine learning technology using Big Data to aid medical providers in delivering better and faster care, especially in these areas where time is critical. 

What we do

Our ML team leverages Geisinger’s robust, 20+ year history of electronic health record (EHR), imaging and genomic databases to develop high-impact, clinically actionable predictive technology. This use of intelligent computer assistance is necessary to sustain and improve medical care, and Geisinger is proud to be at the forefront of the development and clinical application of these emerging technologies.

Our Artificial Intelligence (AI) work fits into seven categories:

  • Disease management
  • Financial health
  • Imaging sciences
  • New markets
  • Patient experience
  • Patient safety
  • Population health

Interested in working with us? Email us at, and let’s talk.

What's the difference between AI and predictive analytics?

Predictive analytics (PA) and machine learning (ML) have similar goals. Machine learning is a branch of predictive analytics, with only the methods rather than the aims differing.  Both PA and ML attempt to increase value by unlocking patterns hidden in vast amounts of a company’s data.

Both PA and ML have been described in the past as “Artificial Intelligence,” commonly called AI. Each methodology uses a blend of statistics and rule deduction to perform tasks that the user believes requires intelligence.

Obtaining performance

Predictive analytics can generally accomplish its results with less data. The reason for this is because human experts develop rule sets based on their expertise in the subject and validate their results based on the human programmers’ confidence that the rules from the present will persist for a reasonable time into the future. If this assumption fails, the humans re-assess the rules and essentially hard-wire the computer a new ‘brain’ with new rules.

Machine learning is a technique where algorithms are given data and asked to process it without predetermined rules.  ML accomplishes its goals by simply being trained on additional data, without any review of its internal logic.  The computer learns to ‘see better’ with more experience.  The ‘brain’ remains the same.

ML algorithms use what they learn from their mistakes to improve future performance without the need to be reprogrammed on a periodic basis. Data feeds ML; the results are most accurate when the machine has access to massive amounts of it to refine its algorithm. 

Predictive analytics

Predictive analytics attempts to forecast the most likely scenarios by comparing current conditions to historical data and placing the results in a modern context. In PA, humans assess the mistakes of the machine and reprogram the machine to perform better.  It’s often used in sales lead scoring, where leads are assigned priority based on the past value of similar customers.

Our work

Stroke prediction

Geisinger’s internal innovation data science team is collaborating with our Neurology Department to build a predictive model to identify patients with high risk of stroke. Those patients identified are preemptively managed to prevent stroke occurrence or recurrence. Initial tests are promising, and the model can identify high-risk stroke patients with 80 percent accuracy

Early detection of sepsis

The Epic predictive model implemented as a pilot at Geisinger Medical Center helps identify patients with sepsis. Early results have helped to increase the number of patients identified as having sepsis by 30 percent. Though the specificity of the model is not very high, its ability to identify additional sepsis patients who can benefit from early intervention has prompted the expansion of the pilot to three more hospitals in the health system. Our approach blends data from heterogeneous sources into a longitudinal clinical record. This is used to build a predictive model that estimates likelihood of progression from sepsis to septic shock. As sepsis to septic shock progression in a patient occurs within a matter of hours, the success of this clinical decision system is defined by how early a patient is identified as high risk at the time of admission. The sepsis to septic shock predictive model provides guidance to medical teams for monitoring patients and taking possible preventive measures in treating patients that have a high probability of mortality. Also, in a clinical setting it would help to allocate appropriate clinical resources - personnel and infrastructure, to patients with higher likelihood of transitioning to septic shock. Using CMS reporting measures as the source of truth for labeled data, our novel approach blends various disparate data sources including labs, medications, vitals, chronic conditions diagnoses, past medical history and patient notes, to identify conditions such as systemic inflammatory response syndrome and organ dysfunction. This produces a chronological order of medical events and helps to assign sepsis patients into possible candidates for septic shock. Out of eight different classification models tested on the data for identifying patients with septic shock, Random Forest performed the best with an area under the curve (AUC) of 94.83 percent.

Health plan high-dollar value claims

This joint project between Geisinger Health Plan and our internal data science team aims for earlier identification of member patients likely to submit high-dollar claims (>$750K) using machine learning software. This will help the health plan submit these claims to reinsurance as well as accurately allocate capital funds to the health system. Previously, the processes were more manual, leading to errors, delays and misses in claim submission for reinsurance.

Patient notes index

The team offers the ability to do keyword searches on unstructured patient notes on all Epic HNO and RAD notes. For instance, the keyword “coumadin” would list all patient notes which included “coumadin.” The result set can be further pared down using factors such as date/time filters, note type and facility location.

This feature has been used extensively by our physicians, researchers and even analysts from different service lines for a myriad of use cases.

In one example, a department moved to a new building. After the move, however, staff members discovered they had lost the records of patients who should have been rebooked for a follow-up appointment. A fellow physician used the tool to search for specific keywords and date ranges, generating a list of medical record numbers (MRNs) of patients at a particular facility who were not booked for a follow-up appointment.

Natural language processing of radiology and pathology data

One of the biggest hurdles of healthcare today is how to convert big data into smart data. The solution lies in the effective usage of smaller, smarter and parallel computers that can handle very large and diverse data sources, and then formatting those data sources to produce intuitive and streamlined information. This will enable healthcare providers to better focus on quality patient care. Natural language processing (NLP) of electronic medical records (EMR) helps Geisinger accomplish this tough task of converting data in such a way by analyzing and extracting meaning from narrative text and unstructured data sources. Our Unified Data Architecture (UDA) team achieves this by using state-of-the-art, open-source NLP engines, such as Apache cTAKES, OpenNLP and StanfordNLP, in a Hadoop and Spark environment to extract useful information that includes medicals concepts, terms, semantic groups and the contexts of polarity, history and the subject of the information. We have annotated the narrative notes from our radiology and pathology departments using these engines and loaded the results onto hive tables and search engines based on Apache Solr and Elastic. We also annotate Epic data on a use-case basis and generate similar dashboards for our providers, so they can use the extracted information to avoid care-gaps and improve subsequent patient care.

AI machine learning