Science and Tech

Artificial intelligence and supercomputers to design proteins

[Img #74340]

Harnessing the power of artificial intelligence and several of the world’s fastest supercomputers, a research team has developed an innovative system to accelerate the design of new proteins.

The team has been led from the Argonne National Laboratory in the United States. Five supercomputers have been used. One of them has been that of the aforementioned laboratory called Aurora. The others have been: Frontier from the Oak Ridge National Laboratory in the United States, Alps at the Swiss National Supercomputing Center, Leonardo from the CINECA center in Italy and the PDX Machine at the NVIDIA corporation. 1 exaflop of sustained performance was achieved on each supercomputer, with a maximum performance of 5.57 exaflops on Aurora.

The system developed by the aforementioned team, which includes Arvind Ramanathan, Gautham Dharuman and others, is called MProt-DPO. The acronym “DPO” in this name stands for “Direct Preference Optimization.” The DPO algorithm helps AI models improve by learning from preferred and non-preferred outcomes. By adapting DPO to protein design, the team made their system learn from experimental results and simulations as they occurred.

One of the main innovations of MProt-DPO is its ability to integrate different types of data formats. It combines traditional protein sequencing data with experimental results, molecular simulations, and even written text explanations that provide details about the properties of each protein. This approach has the potential to accelerate protein discovery for a wide range of applications.

Let’s say for example that you are looking to create a new vaccine or design an enzyme that can decompose plastics to recycle them in an environmentally friendly way. In either case and others, the new system based on artificial intelligence and supercomputers can help researchers focus on promising proteins from countless possibilities, including candidates that may not exist in nature.

Relating the amino acid sequence of a protein to its structure and function is an old challenge for research. Each unique arrangement of amino acids (the building blocks of proteins) can lead to different properties and behaviors. The enormous volume of possible variations makes it impractical to test them all by experiment alone.

To put it in perspective, modifying just three amino acids in a sequence of 20 creates 8,000 possible combinations. But most proteins are much more complex, with some research targets containing hundreds or thousands of amino acids.

“For example, if we change the position of 77 amino acids within a 300-amino acid protein, we have a googol design space (a one followed by a hundred zeros) to represent the unique possibilities,” explains Dharuman. This is why supercomputers and large artificial intelligence systems are needed, as Dharuman justifies.

Certainly, working with several billion parameters, and an even larger amount of data, there is an obvious need to work with supercomputers. Even more so if we take into account that the work includes carrying out large-scale simulations to verify the stability and catalytic activity of the generated protein sequences.

A sector of the Aurora supercomputer, which occupies a room as large as an industrial warehouse. (Photo: Argonne National Laboratory)

The way artificial intelligence works in MProt-DPO is not very different from that of the popular ChatGPT. In the case of ChatGPT, human users provide feedback on whether a response is helpful or not. That information is returned to the training algorithm to help the model learn its preferences. MProt-DPO works in a similar way, but text interaction with humans is replaced by experimental and simulation data to help the system determine which protein designs are most successful.

The team tested MProt-DPO in two tasks to demonstrate its ability to address complex protein design challenges. First, they focused on the yeast HIS7 protein, using experimental data to improve the performance of several mutations. For the second task, they worked with malate dehydrogenase, an enzyme that plays a key role in how cells produce energy. Using simulation data, they optimized the enzyme design to improve its catalytic efficiency.

Now, the team is collaborating with biologists at Argonne National Laboratory to validate the designs generated by artificial intelligence in the laboratory. The tests carried out so far show that the designs have the expected characteristics. (Fountain: NCYT by Amazings)

Source link