AI just read a million DNA bases
A quiet limitation in genomics has always been context. Many models can score short DNA motifs, but real biology often depends on long range interactions, like enhancers looping to promoters, or structural features spanning huge stretches of a chromosome. In the last couple of weeks, a Nature paper put long context at the center of the story by introducing Evo 2, a DNA foundation model built to model genomes across all domains of life.
Evo 2 is notable for two reasons that are easy to explain without math. First, it was trained on a massive and diverse snapshot of genomes, spanning bacteria, plants, animals, and more, so the model learns patterns that generalize across evolution rather than memorizing one corner of biology. Second, it can handle very long sequences in one go, up to about one million DNA bases of context, which is the scale where many regulatory and structural signals start to make sense.
What does that buy you in practice. The authors report that Evo 2 can predict the functional impact of genetic variation, including clinically significant variants such as BRCA1, without task specific fine tuning in the way earlier models often required. That matters because variant interpretation is one of the real bottlenecks in modern medicine. If a model can generalize from raw DNA to meaningful functional predictions, it can help triage which mutations are likely worth expensive follow up experiments.
The other headline is design. The paper and accompanying coverage emphasize that the model is not only a reader of DNA. It can also generate genome scale sequences, meaning it can propose long stretches of genetic code that look biologically plausible at scales closer to real organisms. That is exciting, but it also raises the bar for responsibility, because the same capability that helps design useful biological systems also increases the need for careful screening, governance, and lab validation.
The bigger picture is that biology is starting to get its own foundation model stack, similar to what happened in language and vision, but with a different unit of meaning. In text, a token is a word piece. In genomics, the unit is a base, and the grammar spans enormous distances. Evo 2 is a sign that the field is beginning to treat that scale as the default rather than the edge case, and that is exactly what you need if you want AI to move from pattern spotting to true genome level reasoning.
Sources https://www.nature.com/articles/s41586-026-10176-5
https://pubmed.ncbi.nlm.nih.gov/41781614/
https://arcinstitute.org/tools/evo
https://www.drugdiscoverynews.com/ai-model-trained-on-100-000-species-learns-to-read-and-design-genetic-code-17054
https://arstechnica.com/science/2026/03/large-genome-model-open-source-ai-trained-on-trillions-of-bases/