New Microbial Genomes Help Scientists Fill Gaps in the Tree of Life

A new batch of genomes expands the catalog of microbe-derived proteins and enzymes that could transform medicine, energy production, and various other fields.

Microbes are the workhorses of the natural world. In the soil, they convert the essential elements of life into nutrients that can be absorbed by plants. They break down dead organic matter to release carbon and other critical elements back into the earth and air. And without the billions of microbes in our gut, humans wouldn’t even be able to digest food and convert it into usable energy.

Yet despite the critical importance of microbes to life on Earth — and their increasing usefulness in energy production, agriculture, and biotechnology — we still know very little about how they do what they do. That’s because microbes are the most abundant and diverse life forms on the planet, with an estimated billion or more species, only a few thousands of which have been named and identified.

Now a groundbreaking project from the United States Department of Energy (DOE) is attempting to shine light on the unexplored branches of the tree of life by sequencing large numbers of unknown microbial genomes. The group published a studyin Nature Biotechnology this week in which they analyzed 1,003 new genomes that were sequenced from bacterial and archaeal organisms.

This latest batch of microbial genomes not only confirms the tremendous genetic diversity of microbes, but adds to the growing catalog of microbe-produced proteins and enzymes that could one day transform medicine, energy production, genetic engineering, and various other fields.

RELATED: Fungal Genomic Breakthrough Unlocks a ‘Gold Rush’ of New Drug Discoveries

Nikos Kyrpides leads the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative at the DOE’s Joint Genome Institute. He explained that the first 20 years of microbial genome sequencing focused on well-known microorganisms like viruses and pathogenic bacteria. In fact, by 2015, 43 percent of all sequenced bacterial genomes were strains from the same 10 pathogenic species.

But that narrow focus ignored large swaths of the phylogenetic family tree that left entire branches without a single representative genome. Armed with radically faster and more powerful sequencing technology, Kyrpides’s group set out to catalog genomes representing the full diversity of microbial life on Earth. In 2009, they published an analysis of the first batch of 56 microbial genomes, in which they identified sequences of microbial DNA that pumped out entirely new proteins and enzymes.

“We saw there’s an enormous amount of discovery that can be done through the study of microbes for which we don’t know anything about.”

“We saw there’s an enormous amount of discovery that can be done through the study of microbes for which we don’t know anything about,” said Kyrpides, who quickly proposed funding for a much larger sequencing effort.

This latest batch of more than 1,000 genomes included 845 “singletons” — the only sequenced representative of their species. Analysis of the genomes also revealed a 10 percent increase in novel protein families.

Jonathan Eisen, an evolutionary biologist at the University of California Davis, helped launch the microbial genome encyclopedia project at the DOE. He said that the value of this open genomic reference library is twofold: first, it provides researchers worldwide with a more accurate catalog of the diversity of life; and second, it identifies new proteins and enzymes that can used for a variety of purposes, from developing new cures for chronic diseases to efficiently generating natural gas from biomass.

RELATED: We Can Now Extract the DNA of Rare Animals Preserved in Museum Jars

Eisen noted that data from the first 56 genomes analyzed in 2009 led to the discovery of new forms of cellulase, the enzyme that breaks down plant material for biofuel production. Researchers also scanned the growing genomic encyclopedia to find novel variants of the Cas9 protein that may improve upon the popular CRISPR gene-editing technology, said Kyrpides.

In its mission to fill the microbial gaps in the tree of life, the DOE team searched high and low for microbes that fell outside of the spotlight. The latest batch of 1,000 bacteria and archaea — primitive single-celled organisms without a nucleus or membrane-bound organelles — were sampled from extreme environments like oil springs, industrial waste sites, and the funkier corners of the human body.

The effort to sequence unknown microbes has already paid off in some appropriately unexpected ways. Eisen points to a 2015 paper that revealed some key differences between the gut microbes of modern Westerners and those living in the digestive tracts of a hunter-gatherer tribe in Peru. One microbe in particular, Treponema, was present in large numbers in the hunter-gatherers but almost non-existent in folks from Oklahoma. The researchers were able to match the mysterious gut microbe’s genome with its closest relative, a Treponoma species found in pigs, because it was already in the DOE encyclopedia.

What’s important to Eisen is that without the Genomic Encyclopedia of Bacteria and Archaea initiative, there would have been no reference point for Treponoma on the phylogenetic family tree. It demonstrates the value of plucking samples from every inch of the tree of life rather than focusing only on sources and systems that we deem most useful.

“Here’s this ostensibly really important member of the human microbiome, at least in these hunter-gatherer populations, that was completely missed by the Human Microbiome Project,” Eisen said, referring to the National Institutes of Health project to sequence the most important “good” and “bad” microbes in the human gut.

RELATED: Methane-Eating Microbes Produce Food for Farmed Animals

Kyrpides recognizes that such a large-scale genome sequencing effort would have been prohibitively expensive and painfully slow even five years ago. But profound improvements in sequencing technology have opened the doors to unfettered exploration of microbial diversity. The key next-generation sequencing platforms used by the DOE group were Illumina and PacBio.

Technological improvements are also revolutionizing the application of this new genomic data, Eisen said. If a bioenergy or biomedicine company wants to experiment with a new protein or enzyme found in the encyclopedia, it no longer has to culture the particular microbe that produces the enzyme or extract and clone its DNA. That’s what the genome encyclopedia is for.

“When you know the sequence that produces the protein or enzyme, you can actually order it from a company,” said Eisen. “You can type in the sequence of the gene you’re interested in.”

Using synthetic biology, Eisen explained, it’s possible to make that string of DNA from scratch and plug it into a lab-friendly microbe like yeast or E. coli to start pumping out “gobs” of the target enzyme “for whatever purpose you want.”

WATCH: Can Bacteria on Earth Help Us Find Alien Life?