The first human genome was mapped in 2001 as part of the Human Genome Project, but researchers knew it was neither complete nor completely accurate. Now, scientists have produced the most completely sequenced human genome to date, filling in gaps and correcting mistakes in the previous version.
The sequence is the most complete reference genome for any mammal so far. The findings from six new papers describing the genome, which were published in Science, should lead to a deeper understanding of human evolution and potentially reveal new targets for addressing a host of diseases.
A more precise human genome
“The Human Genome Project relied on DNA obtained through blood draws; that was the technology at the time,” says Adam Phillippy, head of genome informatics at the National Institutes of Health’s National Human Genome Research Institute (NHGRI) and senior author of one of the new papers. “The techniques at the time introduced errors and gaps that have persisted all of these years. It’s nice now to fill in those gaps and correct those mistakes.”
“We always knew there were parts missing, but I don’t think any of us appreciated how extensive they were, or how interesting,” says Michael Schatz, professor of computer science and biology at Johns Hopkins University and another senior author of the same paper.
The work is the result of the Telomere to Telomere consortium, which is supported by NHGRI and involves genetic and computational biology experts from dozens of institutes around the world. The group focused on filling in the 8% of the human genome that remained a genetic black hole from the first draft sequence. Since then, geneticists have been trying to add those missing portions bit by bit. The latest group of studies identifies about an entire chromosome’s worth of new sequences, representing 200 million more base pairs (the letters making up the genome) and 1,956 new genes.
“Since the Human Genome Project [in 2001], we have declared victory a few times over the last two decades,” says Evan Eichler, professor of genome sciences at the University of Washington and another senior author of one of the papers. Eichler, who was also involved in the mapping of that original sequence, says the emphasis of what has been sequenced this time around is different. “While the original goal of the Human Genome Project was to order and orientate every base pair, that couldn’t be achieved because the technology wasn’t sufficiently advanced enough. So we finished the parts that we could finish.”
The promise of the new findings
The newly sequenced regions include previously inaccessible sections such as the centromeres, the tightly wound central portions of chromosomes that keep the long double strands of DNA organized as the strands unwind, bit by bit, to copy themselves and separate into two cells as a single cell divides. These regions are critical for normal human development and also play a role in brain growth and neurodegenerative diseases. “It’s been one of the great mysteries of biology that all eukaryotes—all plants, animals, people, trees, flowers and higher organisms—have centromeres. It’s a really fundamental part of how DNA replicates and how chromosomes organize and how cells divide. But it’s been a great paradox, because while its function has been around for billions of years, it was almost impossible to study because we didn’t have a centromere sequence to look at,” says Schatz. “Now we finally do.”
Scientists were also able to sequence the long stretches of DNA that contained repeated sequences, which genetic experts originally thought were similar to copying errors and dismissed as so-called “junk DNA”. These repeated sequences, however, may play roles in certain human diseases. “Just because a sequence is repetitive doesn’t mean it’s junk,” says Eichler. He points out that critical genes are embedded in these repeated regions—genes that contribute to machinery that creates proteins, genes that dictate how cells divide and split their DNA evenly into their two daughter cells, and human-specific genes that might distinguish the human species from our closest evolutionary relatives, the primates. In one of the papers, for example, researchers found that primates have different numbers of copies of these repeated regions than humans, and that they appear in different parts of the genome.
“These are some of the most important functions that are essential to live, and for making us human,” says Eichler. “Clearly, if you get rid of these genes, you don’t live. That’s not junk to me.”
Deciphering what these repeated sections mean, if anything, and how the sequences of previously unsequenced regions like the centromeres will translate to new therapies or better understanding of human disease, is just starting, says Deanna Church, a vice president at Inscripta, a genome engineering company who wrote a commentary accompanying the scientific articles. Having the full sequence of a human genome is different from decoding it; she notes that currently, of people with suspected genetic disorders whose genomes are sequenced, about half can be traced to specific changes in their DNA. That means much of what the human genome does still remains a mystery.
There’s still room for improvement. The new sequence comes from essentially half a human—that is, half of the genetic content normally found in a person’s DNA. Each person has two sets of chromosomes, a maternal and a paternal one. Each of those strands of DNA contain slightly different versions of genes, essentially giving us two genomes. Assembling those two genomes is not a trivial task, and those challenges hampered the original Human Genome Project and led to its missing parts. The sequencing technology at the time could not easily separate the maternal and paternal copies of DNA, so if the scientists attempted to match up certain sections thinking they were working with the maternal chromosome, for example, they might run into areas where they failed to match because they were actually working with the paternal chromosome. “It’s similar to having two puzzles in the same box,” says Phillippy. “You have to sort out what the differences are and reconstruct both.”
For this new sequence, the scientists took advantage of a fertilization error in which the resulting embryo contains only paternal chromosomes. The resulting growth was removed and in the early 2000s perpetuated in the lab as a cell line that remained viable despite its abnormal chromosomal content. That made it easier for the teams to assemble the genome because they were essentially working with only a single genetic puzzle to solve.
Ultimately, however, researchers will need a more complete human genome with the complete sequences of both maternal and paternal chromosomes. That’s coming soon. Phillippy and others are working with trios of DNA samples from volunteers and their mothers and fathers so that the scientists can separate the maternal DNA from the paternal sequences and essentially assemble two genomes separately. The teams expect to have the so-called diploid human genome sequence completed by the end of the year.
Already, says Winston Timp, associate professor of biomedical engineering at Johns Hopkins and a co-author on one of the papers, “the new genome assembly is paying dividends because it provides a more accurate map to understand what data we had from before meant.” That includes finding new variants that might distinguish healthy people from those affected by disease, for example, as well as variants that might put people at higher risk of developing certain diseases.
“We’ve discovered millions of genetic variants that were previously not known across samples of thousands of individuals whose genomes have already been sequenced,” says Rajiv McCoy, assistant professor of biology at Johns Hopkins and another co-author. “We will have to wait until future work to learn more about their associations with disease, but a big focus of work now will be on trying to discover new genetic variations that were previously uncharacterized.”
Even with the more complete version of the human genome, scientists likely won’t be clamoring to replace the old version, despite its gaps and errors. That’s because the decades of work on human genetics has made that older version far more annotated than the new one—similar to the difference between your favorite copy of book, with your handwritten notes and highlighting in the margins, and a fresh copy from the bookstore. “A genome is only as good as its annotation,” says Eichler. “All the clinical and research labs have built decades worth of data based on the old, gap-filled genome. To redo all of that work for any individual lab would be horrific.” He predicts that many labs will gradually switch over to working with the new genome by comparing smaller datasets first in a test run to see how much richer and more comprehensive the information they generate from the newer genome is. As with the original human genome, the new one is also posted on a public database for any scientist to use. “For now, both genomes will be kept up so there will be no replacement,” he says.
In coming years, researchers will also start to generate more of the complete genome, using both maternal and paternal DNA, to help scientists identify the best targets for new therapies and improve understanding of human development and evolution. The more genomes they have, the more potentially important patterns will stand out that could lead to new understanding of human disease and new treatments for them. Ultimately, the goal is that every person would be able to have their complete genome sequenced as part of their medical record, which would allow doctors to compare those sequences to reference ones and determine which variations might be contributing to specific diseases.
“This is presenting the world with a whole additional chromosome that we have never seen before,” says Karen Miga, assistant professor in biomolecular engineer at University of California, Santa Cruz and a senior author of one of the papers. “We have new landscapes, new sequences and the opportunity and promise of new discoveries.”
The excitement in the genomic and medical community is palpable. “Hallelujah, we finally finished one human genome, but the best is yet to come,” Eichler said during a briefing. “No one should see this as the end, but the beginning of a transformation not only in genomic research but in clinical medicine as well.”
More Must-Read Stories From TIME