close
close

AI trained on the theory of evolution develops proteins that advance drug research and scientific discoveries

Using the MutRank framework trained with EvoRank, Danny Diaz and Professor Andrew Ellington's team develop an improved version of a protein that is critical for the bioproduction of mRNA therapeutics and vaccines. In this example, the model recommends leaving the blue parts as the natural version of the protein and seriously considering mutating the red parts. Image credit: Danny Diaz/University of Texas at Austin

A new artificial intelligence model developed by researchers at the University of Texas at Austin is paving the way for more effective and less toxic treatments and new prevention strategies in medicine. The AI ​​model supports the development of protein-based therapies and vaccines by leveraging the underlying logic of nature's evolutionary processes.

The AI ​​advancement, called EvoRank, offers a new and tangible example of how AI can help fundamentally transform biomedical research and biotechnology in general. Scientists described the work at the International Conference on Learning Representations (ICLR 2024) and published a corresponding paper in Nature communication on using a broader AI framework to identify useful mutations in proteins.

A major obstacle in developing better protein-based biotechnologies is having enough experimental data on proteins to adequately train AI models and understand how specific proteins work, thus designing them for specific purposes.

The key insight of EvoRank is to harness the natural variations of millions of proteins that have evolved over time and to unravel the underlying dynamics required for practical solutions to biotechnological challenges.

“Nature has spent three billion years evolving proteins, mutating or swapping amino acids and retaining those that benefit living things,” said Daniel Diaz, a computer science researcher and co-leader of the Deep Proteins Group, an interdisciplinary team of computer science and chemistry experts at UT.

“EvoRank is learning to categorize the evolution we see around us, essentially teasing out the principles that govern protein evolution, and using those principles to guide the development of new protein-based applications, including for drug and vaccine development and a wide range of biomanufacturing purposes.”

UT is home to one of the country's leading AI research programs and the Institute for Foundations of Machine Learning (IFML), led by computer science professor Adam Klivans, who also co-leads Deep Proteins.

In a project by Deep Proteins and vaccine maker Jason McLellan, a UT professor of molecular biosciences, in collaboration with the La Jolla Institute for Immunology, AI is being used in protein engineering research to develop vaccines against herpes viruses.

“Engineering proteins with capabilities that natural proteins do not have is a recurring grand challenge in the life sciences,” Klivans said. “It also happens to be the kind of task that generative AI models are made for, as they can synthesize large databases of known biochemistry and then generate new designs.”

Unlike Google DeepMind's AlphaFold, which uses artificial intelligence to predict the shape and structure of proteins based on their amino acid sequence, the Deep Proteins group's AI systems suggest how proteins can best be modified to achieve specific functions. This could, for example, simplify the development of a protein for new biotechnologies.

McLellan's lab is already synthesizing different versions of viral proteins based on AI-generated designs and then testing their stability and other properties.

“The models have suggested substitutions that we would never have thought of,” McLellan said. “They work, but they are things that we would not have predicted. So they are actually finding new space for stabilization.”

Protein therapeutics often have fewer side effects and can be safer and more effective than alternatives. The global industry, now valued at $400 billion, is expected to grow by more than 50 percent over the next decade. Yet developing a protein-based drug is slow, expensive and risky.

The process from developing a drug to completing clinical trials takes more than ten years and is estimated to cost a billion dollars or more; and even then, the chances of a company's new drug receiving FDA approval are only about 1 in 10.

In addition, in order to be useful for therapy, proteins often have to be genetically modified, for example to ensure their stability or to achieve a yield required for drug development. Until now, such genetic engineering decisions have been dictated by laborious trial and error in the laboratory.

If EvoRank – and the associated Stability Oracle framework developed by UT on which it is based – were adopted commercially, the industry would have the opportunity to save time and costs in drug development and would have a roadmap to move to better designs faster.

The researchers who developed EvoRank used existing databases of naturally occurring protein sequences. Essentially, they lined up and compared different versions of the same protein found in different organisms, from starfish to oak trees to humans.

At any given position in the protein, there can be one of several different amino acids that have proven useful over the course of evolution. For example, nature chooses the amino acid tyrosine 36% of the time, histidine 29% of the time, lysine 14% of the time – and, more importantly, never leucine.

Using this goldmine of existing data reveals a fundamental logic of protein evolution. Researchers can eliminate options that evolution predicts would lead to the loss of protein functionality.

The team uses all of this to train the new machine learning algorithm. Through continuous feedback, the model learns which amino acid nature has chosen in the past when evolving proteins, and it bases its understanding on what is and is not plausible in nature.

Next, Diaz plans to develop a “multi-column” version of EvoRank that can assess how multiple mutations simultaneously affect a protein's structure and stability. He also wants to develop new tools to predict how a protein's structure relates to its function.

Further information:
Daniel J. Diaz et al, Stability Oracle: a structure-based graph transformer framework for identifying stabilizing mutations, Nature communication (2024). DOI: 10.1038/s41467-024-49780-2

Provided by the University of Texas at Austin

Quote: AI trained on evolution’s playbook develops proteins that drive drug and scientific discoveries (September 26, 2024), accessed September 26, 2024 from

This document is subject to copyright. Except for the purposes of private study or research, no part of it may be reproduced without written permission. The contents are for information purposes only.