Scientists have developed an AI system capable of making man-made enzymes from zero. A few of these enzymes performed in laboratory tests just like those found in nature, although their artificially produced protein sequences diverged significantly from any known natural protein.
The experiment shows that natural language processing can learn a few of the fundamental principles of biology even though it was developed to read as well as write language text. Salesforce Research has developed an AI program known as ProGen that uses subsequent token prediction to create amino acid sequences into artificial proteins.
The researchers said the new technology might be more powerful compared to directed evolution, the Nobel-prize winning protein design technology, and energize the 50-year-old protein engineering field by speeding the development of new proteins that can be used for just about anything, from therapeutics to degrading plastic.
“The synthetic designs perform a lot better compared to styles which were inspired by the evolutionary process,” said James Fraser, PhD, professor of therapeutic sciences and bioengineering in the UCSF School of Pharmacy, as well as a co-author of the work, which was published Jan. 26, in Nature Biotechnology.
The language model is mastering aspects of evolution, though it’s different compared to normal evolutionary process, Fraser said. “We now have the ability to tune the creation of these attributes for specific effects,” it stated. An enzyme, for example, that is incredibly thermostable or loves acidic environments, or won’t interact with other proteins.
Scientists injected the amino acid sequences of 280 million various proteins into the machine learning model and allowed it to process the information for a couple of weeks. Chances are they refined the model by priming it with 56,000 sequences from 5 lysozyme families together with some contextual information regarding these proteins.
The model quickly produced a million sequences, and the study group selected 100 to test, based on the best way closely they resembled the sequences of organic proteins, and how naturalistic the AI proteins’ underlying amino acid “grammar” and “semantics” were.
From this first batch of 100 protein-rich foods screened in vitro by Tierra Biosciences, the team made 5 man-made protein-rich foods to check in cells, and compared their activity to an enzyme found in the whites of chicken eggs, called HEV – white lysozyme (HEWL). In human tears, saliva and milk, identical lysozymes are discovered exactly where they guard against bacteria and fungi.
Two synthetic enzymes had the ability to break down the cell walls of bacteria with an activity comparable to HEWL, however their sequences were only about 18 % similar to one another. The two sequences were more or less 90% and 70% identical to recognized proteins.
Just one mutation of a natural protein can make it stop working, but the researchers found the AI-generated enzymes showed activity in a second round of screening, when as little as 31.4% of their sequence resembled any recognized natural protein.
Perhaps making use of raw sequence information, the AI was able to figure out the way the enzymes should be shaped. With X-ray crystallography, the atomic structures of the synthetic proteins were just as they should, although the sequences were unlike anything ever seen before.
In 2020, Salesforce Research developed ProGen based on a type of natural language programming that their researchers initially developed to generate English language text.
From their previous work, they understood that the AI program could very well teach grammar and the meaning of words, along with other fundamental rules that make writing well-composed.
‘When you train sequence-based models with a lot of information, they’re really powerful in learning construction and rules,” said Nikhil Naik, PhD, Director of AI Research at Salesforce Research. They discover what words may co-occur and also compositionality. “
Design choices using proteins were just about limitless. Lysozymes tend to be small as proteins go, with around 300 amino acids. There are, however, a great number (20300) of combinations possible with 20 potential amino acids. That’s more than multiplying all the humans that have lived on Earth by the quantity of grains of sand on the Earth multiplied by the quantity of atoms in the universe.
It’s remarkable that the model is able to generate working enzymes, due to the infinite possibilities.
“The capability to create functional proteins from scratch out-of-the-box demonstrates we’re entering into a new era of protein design,” said Ali Madani, PhD, founder of Profluent Bio, former research scientist at Salesforce Research, as well as the paper’s first author. “This is a versatile new application available to protein engineers and we look forward to seeing the therapeutic uses,’ he said.