Driving Research Data and Analytics: the Phenotype Core
An essential component of the MyCode® Community Health Initiative is the Phenotype Core (PC), which is part of the Biostatistics Core. Traditionally, phenotypes are observable characteristics or traits. However, the term has taken on a larger meaning as it now describes a combination of disease attributes as they relate to clinically meaningful outcomes (e.g. symptoms, exacerbations, therapy response, and disease progression rates).
This definition allows for classification of patients into distinct groups for both clinical and research purposes. In addition to its many duties, the PC has two main functions:
- Develop phenotype algorithms
- Model electronic health record (EHR) data
One example of a phenotype algorithm created by the Core is metabolic health and obesity. To create this algorithm, the PC staff looked beyond basic ICD-9 codes for obesity or metabolic diseases, searching patient records for sustained periods of obese and lean measurements. According to H. Lester Kirchner, PhD, senior investigator and director, Biostatistics Core, studying this phenotype “allows us to understand the genetics that protect individuals with prolonged obesity from development of obesity-related co-morbidities.
The phenotype algorithms and data model created by the PC are not only essential to the success of the project, but also benefit all Geisinger research. Often experts base phenotypes on simpler algorithms, primarily using ICD-9 codes.
“Other institutions don’t have the depth and breadth of our data to create detailed phenotypes,” explains Joe Leader, associate director, Biostatistics Core.
In addition to data in the Clinical Decision Intelligence System (CDIS), the PC extracts disparate clinical and departmental data for use in their activities. They also use metrics, such as frequency of visits, laboratory results, procedures, diagnoses, etc. to enrich phenotype algorithms. Once a phenotype is created; it is then validated through chart review.
Another important responsibility for the PC is to ensure that Geisinger data follows standard clinical nomenclature when available (e.g., LOINC, SNOMED CT, RxNorm) so that data can be shared broadly with collaborators. For example, Geisinger’s EHR uses Medispan, a proprietary classification system, for medication orders. To enable crossinstitution sharing of the data, the PC is mapping all of the historical Medispan information to RxNorm, a publicly available data standard that provides links to drug vocabularies commonly used in pharmacy management and drug interaction software.
Headed by Kirchner and Leader, the team includes six programmers, a project manager, an ETL developer/Database Administrator, and a biostatistician. “So far we’ve been fortunate to hire folks who are experienced both in the healthcare setting and as programmers,” notes Leader. “We train our team in multiple disciplines and they routinely work with multiple programming languages and databases and they must understand clinical care and workflow processes.”