Six years ago, the National Institutes of Health placed its biggest ever bet on precision medicine, launching a study to enroll over 1 million participants in an ambitious data-gathering gambit unmatched in its scope and diversity. Since then, Americans from all walks of life have been showing up and handing over their blood, spit, and pee to the project, dubbed “All of Us.” From those samples, scientists have recovered a trove of new genetic information — more than 275 million never-before-seen DNA variants.
The data, reported Monday in Nature, aim to address a longstanding lack of diversity in genomic datasets that has led to a narrow understanding of the biology of disease and undermined the promise of precision medicine.
Although people of European descent account for less than one-quarter of the world’s population, their DNA disproportionately drives genetics research. Between 2005 and 2018, the majority of genome-wide association studies were conducted with data from people living in just three countries — the United Kingdom, the United States, and Iceland.
“The paradox of precision medicine is that you have to have a ton of different kinds of people to figure out one person really well,” said Josh Denny, CEO of the All of Us research program. “There’s still so much we don’t understand about the human genome, especially about rare variation. Huge projects like ours are really helping to accelerate that understanding.”
All of Us has recruited more than 750,000 volunteers to provide survey responses about their health, medical records, and if they’re willing, biological samples for molecular and genetic testing. Genetic data from some participants have been available for researchers since 2020, but the new release this week includes the whole genome sequences of nearly 250,000 participants — half of whom are of non-European ancestry.
Because of its size and emphasis on diversity, All of Us is unlike any other project. The Million Veterans Program, run by the Department of Veterans Affairs, is similarly far-reaching but limited to former service members. So far, 77% of participants in that initiative are of European descent. The UK Biobank, which has been fueling much of the genetics research of the past two decades, is 88% white.
The other thing that’s novel about All of Us is the low barrier to entry for researchers to access data. Historically, if scientists wanted to work with data from large-scale, NIH-supported projects, like the Framingham Heart Study, they had to submit proposals and get approved as an individual investigator, which could take months.
“All of Us flipped the script on that,” said Alexander Bick, a geneticist at Vanderbilt University and lead author on the latest analysis. The program adopted a “data passport” model, which works by credentialing institutions, and then researchers within those institutions have free rein to work with the data once they’ve completed a training on how to use it responsibly.
“What’s just so unbelievably radical is that when we looked at the first couple thousand researchers using the platform, we saw that the process takes, on average, 29 hours. That just really lowers the barrier for trying new things.”
The number of researchers working with All of Us data has increased significantly with every release of new genomic sequences. There are now nearly 7,000 scientists from more than 530 institutions — including 85 historically Black colleges and universities and Hispanic-serving institutions — who have signed up to analyze the available data. And because researchers are required to declare how they’re using the data each time they start a new analysis, participants and other scientists can see what sorts of questions people are investigating.
“The number one theme we’re seeing so far is health equity,” Denny said.
That trend is highlighted in four other papers published Monday in the journals Nature, Nature Medicine, and Nature Communications.
In one study, a research team led by Baylor College of Medicine, took a closer look at the ACMG 59, a list of genetic variants that carry increased risk of disease, to see how frequently they showed up in people with different genetic ancestries across the All of Us cohort. They expected that all the additional diversity would reveal more connections between DNA and disease — that they’d find higher frequencies in the genetic data from people of African, Asian, and South American ancestry, said Eric Venner, a clinical geneticist who led the work. “Instead, it played out the opposite.”
Their analysis found the most disease-causing variants in people with European ancestry. But that doesn’t mean white people are more susceptible to genetic disease. It just means current genetic tests are better at picking up those diseases in white people. “What this is showing us is how much knowledge we’re missing for some of these groups,” said Venner. “They probably are reservoirs of variants that cause disease but we just haven’t seen enough of them yet to be able to interpret them.”
In another study, researchers in the Type 2 Diabetes Global Genomics Initiative combined All of Us data with information about how different tissues express genes to pull out eight distinct physiological aspects of the disease that could inform how different people might progress or how they should be treated. “Type 2 diabetes isn’t one disease, there are different risks, different biological dysfunctions that all manifest in one clinical readout,” said Ben Voight of the University of Pennsylvania, an author on the paper. “What we’re trying to do is dissect that out into different subtypes.”
In a separate project, researchers used All of Us data to create polygenic risk scores for common diseases like diabetes that are more accurate for groups from different ethnic backgrounds. These types of tests work by adding up all the tiny effects of hundreds, sometimes thousands of genetic variants that contribute to someone’s risk of developing a disease. But because those variants are pulled from genetic databases that are overwhelmingly white, they are not very useful for people of non-European ancestry and can cause false results that misrepresent a person’s risk of disease.
“If All of Us didn’t exist, they would have had risk scores that weren’t relevant to a good percentage of their population,” Denny said.
Despite the progress and the return to pre-Covid recruiting levels this year, it’s unlikely that All of Us will hit its initial target of sequencing the DNA of 1 million people by 2026, he said. So far, more than 750,000 people have enrolled in the study, which offers a sort of choose-your-own-adventure of participation, including providing access to electronic health records, sharing FitBit data, and filling out lengthy questionnaires. Some 540,000 of those have also given biological samples — from which DNA is being extracted and sequenced on an ongoing basis. The program expects to release another set of whole genomes to researchers sometime next year.
For Denny, even getting this far is something he never imagined was possible. More than a decade ago, when he joined the NIH working group that would lay out the vision for All of Us, the feedback was that it sounded great, but it could never work in the United States with its fragmented health care system.
“I was told, ‘you’ll never be able to get medical centers to send you health records, you’ll never get a million genomes,’” he said. They’re not there yet, but Denny is hopeful the progress they’ve made will convince Congress to continue funding it beyond 2026. “I do think this is a big moment for the program, and things are only going to accelerate from here.”
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect