Viruses are infectious agents that require living cells of a host to reproduce. When they infect a cell, they force its reproductive mechanisms to synthesize the genetic information of the virus itself. In the case of SARS-CoV-2 (the coronavirus responsible for COVID-19), the instructions necessary for the reproduction process are contained in its core in the form of ribonucleic acid (RNA). While human DNA has a double helix structure, RNA is made up of a single strand, which encodes information using four components: adenine, guanine, cytosine, and uracil.
When there are errors in the replication process—changes in the order in which these four bases occur—mutations appear. Although it was believed that these alterations in the RNA strands were completely random, previous research found that there were more frequent errors than others. More specifically, some host-specific enzymes—organic substances that catalyze chemical reactions—tended to convert the virus’s RNA cytosine to uracil.
In this context, the Chemoinformatics and Nutrition research group at the Rovira i Virgili University (URV) in Tarragona, led by researchers Gerard Pujadas and Santi Garcia, has designed an automatic learning system based on an artificial neural network that is capable of predict virus mutations, derived from the contact of genetic information with certain host enzymes.
Once the evolution of the virus had been analyzed taking its mutations into account, URV doctoral student Bryan Saldivar “trained” an artificial neural network with data from more than 800,000 virus genomes so that it learned to predict which recurring mutations would occur in the face of the virus. to the future. An artificial neural network is a machine learning computational system (a form of artificial intelligence) that connects multiple nodes called artificial neurons that, when trained to perform a particular task, are capable of working together to process large volumes of data. These systems learn by themselves and can shape themselves to achieve a certain result, at the request of the researchers.
Image captured by an electron microscope and processed showing a human cell infected with particles of the SARS-CoV-2 coronavirus (in yellow). (Photo: NIAID/NIH)
Typically, the procedure consists of using a part of the genome to create the network and reserving a part, large enough, to test it and correct its operation if necessary. In this case, the team reserved four genes, one of which contains the information for the protein that allows the virus to enter cells to infect them, in order to focus the study in this direction.
This system, which had never been applied to predict virus mutations, has allowed researchers to anticipate recurring changes in the virus, catalyzed by the human body’s own enzymes. The system also identifies those parts of the virus that cannot change, since if they do, the infectious agent is unable to reproduce.
All this information would allow researchers, on the one hand, to get ahead in the design of drugs and, on the other, to make them more effective in eliminating the virus, using the weaknesses detected to hinder its reproduction. “This research provides relevant information for the scientific community, and it remains here so that it can be consulted,” explains Santi Garcia. He also believes that the methodology is replicable in future pandemics, especially if caused by a coronavirus or a new variant of SARS-CoV-2.
Saldivar’s team exposes the technical details of the machine learning system and the predictions it has made in the International Journal of Molecular Sciences, under the title “Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks”. (Source: URV)