Investigators identify features to better define Long COVID

Using machine learning, they find patterns in electronic medical record data to determine which people are likely to have the disease.

Investigators have used machine learning techniques to identify characteristics of people with long COVID and those who are likely to have it.

The investigators, assisted by the National Institutes of Health (NHI), analyzed a collection of electronic health records (EHRs) available for COVID-19 research to better identify those who have long-lived COVID.

Investigators used EHR data from the National COVID Cohort Collaborative (N3C), a centralized national public database managed by the NIH’s National Centers for Advancing Translation Sciences, to identify more than 100,000 probable cases of long COVID (as of October 2021) and 200,000 cases to be identified as of May 2022.

“It made sense to leverage modern data analysis tools and a unique big data resource like N3C, where many characteristics of Long-COVID can be mapped,” said Emily Pfaff, PhD, clinical computer scientist at the University of North Carolina at Chapel Hill in a statement.

The N3C data contains information representing more than 13 million individuals and nearly 5 million positive COVID-19 cases nationwide. The database supports rapid research on emerging questions about COVID-19 health outcomes, risk factors, therapies, and vaccines.

In the study published in The Lancet’s digital healthInvestigators examined patient demographics, diagnosis, health care utilization and medication data in the medical records of 97,995 people who had contracted COVID-19 and were in the N3C database.

They combined this information with data from nearly 600 people with long COVID from 3 long COVID clinics to create 3 machine learning models to identify people with the condition.

Investigators “trained” computational methods using machine learning to sift through large amounts of data to glean new insights into long-COVID. The models identified patterns that could help investigators understand patient characteristics and identify individuals with long COVID.

The models focused on identifying individuals who may have had long-term COVID in 3 groups of N3C, including all individuals with COVID-19, patients hospitalized with COVID-19, and those who had COVID-19 , but were not hospitalized.

The models were accurate in identifying individuals who were at risk for long COVID by comparing them to those who were in the long COVID clinics.

The machine learning systems classified about 100,000 people in the N3C database that closely matched those with long-COVID, investigators said.

The models looked for common characteristics, including doctor visits and new medications and new symptoms, in people with a positive COVID-19 diagnosis who were at least 90 days away from their acute infection.

In addition, the models identified people with long COVID if they attended a long COVID clinic or had long COVID symptoms and likely had the disease but had not been diagnosed.

The research is part of a larger initiative, Research COVID to Enhance Recovery (RECOVER), which aims to improve understanding of the long-term effects of COVID-19, known as the post-acute consequences of SARS-CoV-2 infection.

The program will accurately identify individuals with long-term COVID, develop prevention and treatment approaches, and answer questions about the impact through clinical trials, observational studies, and more.


Scientists identify features to better define long COVID. EurekAlert. press release. May 16, 2022. Accessed May 18, 2022.

Leave a Comment