As per a paper, published on May 26, 2021 in the bioRXiv (a free online archive pronounced bio-archive), titled ‘the complete sequence of a human geonome’, researchers from Telomere-to-Telomere (T2T) Consortium have successfully sequenced the first truly complete human reference genome. This development will usher in a new era of genomics as now it will be possible to analyse the entire human genetic code.

The human reference genome is a DNA blueprint that helps scientists to answer fundamental questions in biology and disease, and identifying species at the risk of extinction. It preserves genetic information about life on Earth.

Thisnew and complete genome sequence is considered as the largest improvement to the human reference genome since its initial release 20 years ago, when the first drafts of a human genome were published in the journal Science and Nature in 2001. The research paper claims that researchers were successful in filling up the missing sections and assembled a full genome sequence of 3.055 billion base pairs. The final validation of the sequence was done with a software developed by Chirag Jain, assistant professor at the Indian Institute of Science.

This development will help decode newer regions in human DNA and enhance understanding of a number of disorders affecting people. It will lead to better genetic screening thus enabling quick and specific diagnostic tests for treating various maladies.

Twenty years ago in 2001, Celera Genomics and International Human Genome Sequencing had published the first drafts of the human genome which revolutionised the field of genomics. However, there were some gaps. As per Nature, sequencing was not truly complete and about 15 per cent was missing due to technological limitations. In 2013, further research explored the left out areas but eight per cent of the genome could not be sequenced. Since then geneticists have been using this 2013 reference genome for their study. This was like a book with some missing pages.

The new sequence reference consists of gapless assemblies (computational representation of a genome sequence) for all 22 autosomes plus Chromosome X (which look same in males and females), corrects errors, introduces 200 million base pair (bp) of novel sequence, containing 2,226 gene copies; 115 are predicted to be protein coding, which are important to understand diseases.

Newly sequenced regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes. Satellite arrays, which are known to vary extensively in the human population, will aid medical genomics and thereby give better understanding of inherited variation that underlies human physiology, evolution, and diseases. Similarly, better understanding of acrocentric chromosomes, which are linked to disorders like Down syndrome, also has its usefulness.

The Genome construction involved many newly designed computer algorithms, software for processing sequencing data, and turning it into complete human genome. The software takes genome sequencing data as input and maps it to genome assembly. Mapping method had to take into account a large number of repetitive segments. Presence of repeats in a genome makes it challenging because there are many possible alignment candidates for a sequence, and the correct one is rarely obvious. Once data was correctly aligned, differences found between the genome and sequencing data exposed a few mistakes which were corrected by T2T before the final genome release.

T2T-CHM13 represents only one person’s genome and therefore the T2T has now teamed up with the Human Pangenome Reference Consortium (HPRC) to sequence over 300 genomes from people across the world. So, the new sequence is not the last word on human genome as T2T had trouble resolving a few regions on chromosomes, and estimates about 0.3 per cent of the genome might contain errors. As per the T2T researchers, one limitation of CHM13 is lack of a Y chromosome. They are in the process of sequencing and assembling the Y chromosome so that a T2T reference sequence for all human chromosomes can be finished.

According to head of genes, genomes, and variation services at EMBL-EBI, Paul Flicek, who is not part of this work, “The entirety of genomics as a field is a constant cycle between pushing the technological envelope and using these technologies in new and exciting ways.”

The newly sequenced regions like centromeric satellite arrays and the short arms of the five acrocentric chromosomes, the points, have been hard to sequence due to their repetitive sequences and intersecting arms of chromosomes that made the chromosomes’ characteristic X or Y shape is a revolutionary step. The complex regions of the genome are now unlocked. According to another researcher Sarah McClelland, these areas are essential for correct cell division. Not having data about the sequences of these critical regions was ‘a major roadblock for further research’ into how centromeres are involved in cell division, including when cell division goes wrong in cancer. Therefore, this new and gap-free reference has suddenly opened up this field.

About Human Genome Human genome is the complete set of the DNA, the strands of which are like a four letter language-four chemical units or bases that are the alphabet. The letters combine specifically with letters in the opposite strand to form words (base pairs or bp), which encode information. These words are stored in chromosomes in human cells.

As per scientists, if a human genome were a history book, it would have around three billion words (bp) across 22 chapters (chromosomes) having information on human journey through time with a detailed blueprint to build every human cell and would facilitate healthcare providers in treating, preventing, and curing diseases.

© Spectrum Books Pvt Ltd.

 

error: Content is protected !!

Pin It on Pinterest

Share This