A few months after the COVID pandemic was declared – 19 , at the beginning of 2020, the scientists sequenced the virus genome, the SARS-CoV-2 , but many protein-coding genes were still unknown. Now, a comparative genomics study has generated the most accurate and complete genetic map of the virus.
Made by researchers from the Massachusetts Institute of Technology ( MIT) and published this Tuesday in the journal Nature Communications , the study has confirmed several protein-coding genes and discovered that others – which had been proposed as genes – they did not encode any proteins.
“We were able to use this powerful comparative genomics approach of evolutionary signatures to discover the true functional protein-coding content of this huge genome. importance ”, highlights Manolis Kellis, lead author of the study and professor of computer science at MIT, and member of the Broad Institute of MIT and Harvard.
In a second part of the study, the research team also analyzed about 2, 000 mutations that have arisen in the SARS- CoV-2 since the beginning of the pandemic, which allowed them to assess the importance of these mutations and their ability to evade the immune system or become more infectious.
Se knew that, with almost 30, 000 RNA bases, the SARS-CoV-2 genome has several regions that code for protein genes and others that were suspected but not definitively classified.
To determine which parts of the SARS-CoV-2 genome actually contain genes, the researchers turned to comparative genomics, and compared SARS-CoV-2 (which belongs to a virus subgenus called Sarbecovirus, which infects bats) with SARS-CoV (which caused the SARS outbreak of 2003) and 42 strains of bat sarbecovirus.
Thus, they confirmed six protein-coding genes in the SARS-CoV-2 genome, in addition to those five that are well established os in all coronaviruses.
They also determined that the region that encodes a gene called ORF3a also encodes an additional gene, ORF3c, which has RNA bases that overlap with the ORF3a, but that they are in a different reading frame, something rare in large genomes, but common in many viruses and that, in the case of SARS-CoV-2, it is not yet known what function it has.
The researchers also demonstrated that five other regions that had been proposed as possible genes do not encode functional proteins, and ruled out that others remain to be discovered.
In addition, the authors found that many previous works used not only incorrect gene sets, but also sometimes contradictory names, so in a recent parallel article published in the journal Virology, they presented recommendations for naming the genes of the SARS-CoV-2.
In the study, the researchers also analyzed more than 1, 800 mutations that have arisen in SARS-CoV-2 and found that, in most cases, rapidly evolving genes They have continued to do so before the pandemic, and those that tended to evolve slowly have maintained that trend.
They also analyzed mutations that have emerged in worrisome variants, such as the British strain , Brazil and South Africa and noted that many of the mutations that make these variants more dangerous are in the spike protein, which helps the virus spread quickly and bypass the immune system.
However, each of these variants has “ more than 20 more mutations, and it is important to know which of them can do something and which cannot “, warns Irwin Jungreis, lead author of the study and researcher at MIT.
For authors these data could help other scientists focus their attention on the mutations that seem to have the most significant effects on the infectivity of the virus.
Continue Reading: Mucormycosis, the rare and dangerous “black fungus” that affects COVID patients – 19 in India