New DNA Search Engine Brings Order to Biology’s Big Data

Although MetaGraph is tagged as ‘Google for DNA’, Chikhi likens the tool to a search engine for YouTube, because the tasks are more computationally demanding. In the same way that YouTube searches can retrieve every video that features, say, red balloons even when those key words don’t appear in the title, tags or description, MetaGraph can uncover genetic patterns hidden deep within expansive sequencing data sets without needing those patterns to be explicitly annotated in advance.

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

“It enables things that cannot be done in any other way,” Chikhi says.

The motivation behind MetaGraph was to address an accessibility problem in sequencing data sets. The size of these repositories has risen at a blistering pace in the past few decades, but this growth has presented challenges for the scientists using the data they contain. Raw sequencing reads are fragmented, noisy and too numerous to search directly. “The volume of the data, paradoxically, is the main inhibitor of us actually using the data,” says Artem Babaian, a computational biologist at the University of Toronto in Canada.

They tackled the problem through the use of mathematical ‘graphs’ that links overlapping DNA fragments together, much like sentences that share the same words lining up in a book index.

The researchers integrated data from seven publicly funded data repositories, creating 18.8 million unique DNA and RNA sequence sets and 210 billion amino-acid sequence sets across all clades of life — including viruses, bacteria, fungi, plants and animals, including humans. They also developed a search engine for these sequences, in which users use text prompts to search these integrated archives of raw data.

“It is a totally new way to interact with this body of data,” says Kahles. “It’s compressed, but accessible on the fly.”

To demonstrate the utility of MetaGraph, the study authors used it to scan 241,384 human gut microbiome samples for genetic indicators of antibiotic resistance around the world, building on work that used an earlier version of the tool to track drug-resistance genes in bacterial strains that live in subway systems across major urban centres. The authors say they performed the analysis in about an hour on a high-powered computer.

MetaGraph is not the only massive-scale sequence search tool now on offer.

They and others have also used an earlier, narrower search tool tailored to viral-DNA repositories to reveal reams of previously undocumented viruses and viral contaminants in engineered T-cell therapies for treating cancer.

“These are resources to drive scientific progress across the world,” says Babaian. “They are opening up a completely new field of petabase-scale genomics” — and the most impactful applications are yet to come.

This article is reproduced with permission and was first published on October 8, 2025.

Elie Dolgin is a science journalist in Somerville, Mass.

First published in 1869, Nature is the world’s leading multidisciplinary science journal. Nature publishes the finest peer-reviewed research that drives ground-breaking discovery, and is read by thought-leaders and decision-makers around the world.

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you , you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, , must-watch videos, challenging games, and the science world’s best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American