top of page
The  iluli by Mike Lamb logo. Click to return to the homepage
The iluli by Mike Lamb logo. Click to return to the homepage

Predicting Proteins with AI

Proteins probably don’t weigh too heavily on your mind day to day. It’s ironic, given every single cell in your brain, and the rest of your body, is made up of a myriad of protein building blocks known as amino acids.

A whopping 20,000 proteins are expressed by the human genome. The information needed to create these proteins is contained in the nuclei of our DNA. Now, scientists have discovered an intelligent method for reading these instructions.

We’re just beginning to scratch the surface of our understanding of these essential structures. While we are limited by what our human minds can fathom about them, how exactly can we apply this new, AI-generated information? And what’s the link between understanding these structures, and the development of new drugs and treatments for diseases?

A cartoon depiction of a magazine and newspaper, emblazoned with the headlines 'DNA is Here' and 'Human DNA Mapped', next to the Jurassic Park logo and a fly trapped in amber.

In 2021, the BBC reported how a better understanding of our genomes could spark a ‘medical revolution.’ Timely, given the pandemic in our midst. The article explains how London-based company, DeepMind (a subsidiary of Google’s parent company Alphabet focusing on artificial intelligence), had successfully created an AI software programme called AlphaFold. AlphaFold can predict the shape in which proteins would fold, based solely on genetic code. 

The article explains:

“The 350,000 protein structures predicted by AlphaFold include not only the 20,000 contained in the human proteome, but also those of so-called model organisms used in scientific research, such as E. coli, yeast, the fruit fly and the mouse.

“AlphaFold was able to make a confident prediction of the structural positions for 58% of the amino acids in the human proteome... The positions of 35.7% were predicted with a very high degree of confidence — double the number confirmed by experiments.”

It’s an exciting development, but why is it important that we can predict and understand the 3D structure of our proteins?

Well, proteins are made up of numerous strands of smaller “building blocks” called amino acids. The unique 3D shape created by the folding of these amino acid chains determines just how the protein will function in the human body.

Understanding how proteins function is essential, as many challenging disorders such as Alzheimer’s, Parkinson’s, and heart disease are caused by the onset of a hereditary DNA subtype. By understanding how a protein leads to a heritable form of disease, we can derive pointers about the molecular basis of the disease and, in turn, develop targeted treatments that can neutralise or block harmful proteins.

Many eminent researchers were quick to praise the achievement and its potential. In leading journal, Science, expert computational protein scientist Janet Thornton said:

“What the DeepMind team has managed to achieve is fantastic and will change the future of structural biology and protein research.”

Science's report also quotes the Nobel Prize-winning structural biologist Venki Ramakrishnan, who described the work as “a stunning advance on the protein folding problem.”

Of course, one of the main differences between manual and intelligent investigation times is the speed at which the analysis can take place. This is all the more important with globalization and deforestation (among other factors) keeping the door open for future pandemics. As we witnessed with COVID-19, speed is of the essence when it comes to treatment development. Lockdowns, travel bans, and masks have limits in their ability to stop virus spread, and need to be combined with effective vaccination to prevent hospitals becoming overwhelmed.

Commenting on the predictions from AlphaFold, Prof. John McGeehan, a structural biologist at the University of Portsmouth, told BBC News:

"It's just the speed — the fact that it was taking us six months per structure and now it takes a couple of minutes. We couldn't really have predicted that would happen so fast.”

And the possibilities for this tech aren’t limited to the medical sphere. The article continues:

“Those applications we can envisage now include developing new drugs and treatments for disease, designing future crops that can resist climate change, and enzymes that can break down the plastic that pervades the environment.

“Prof. McGeehan's group is already using AlphaFold's data to help develop faster enzymes for degrading plastic. He said the program had provided predictions for proteins of interest whose structures could not be determined experimentally — helping accelerate their project by ‘multiple years.’"

There’s no denying the potential here is monumental. Once the 3D structure of the protein has been identified, how can this information be used to create drugs that work on new or established illnesses?

Reflecting on the significance of the latest version of AlphaFold, Professor Paul Workman, Chief Executive and President of The Institute of Cancer Research (and a drug discovery scientist) explains:

“Most of the new drugs that we discover today for cancer and other diseases exert their effects by targeting particular proteins in the body. Ideally, we wish to design small-molecule drugs to bind very precisely to a tiny region of the overall target protein so as to alter its function. Medicinal chemists will always prefer to have an accurate 3D protein structure so they can apply structure-based drug design — thereby helping them to achieve as ‘perfect fit’ as possible alongside multiple other optimised properties.

“And the 3D protein structure is important even before the drug discovery phase begins — by helping the drug discovery team to assess the druggability of the target protein — the extent to which it has pockets or grooves that allow small-molecule compounds to bind.

“This allows researchers to understand which targets are relatively straightforward to drug and which ones will represent a major challenge. Such information can be extremely useful in prioritising which protein targets to pursue and the approach to be taken.”

A cartoon image of a scientist peering into a microscope, looking at a strand of DNA.

The pandemic afforded us the opportunity to appreciate the advances in science that we’ve made and how these can be applied to modern medicine. The speed of the rollout of the COVID-19 vaccine was thanks in large part to our modern ability to understand and circulate the DNA sequence of the virus itself. 

AlphaFold was part of this process — as reported by Fortune back in 2020:

“The potential to use AlphaFold 2, DeepMind’s still-under-wraps new AI system, to gain insights into the virus was clear. But the system, which was being prepared for its debut in a global competition on protein structure prediction that was still months away, had not yet been fully tested.

“The DNA sequence of SARS-CoV-2, the virus that causes COVID-19, had been published by Chinese researchers as early as January 11 (2020). This allowed researchers to begin to scrutinize proteins associated with the virus as possible targets for vaccines and treatments. Most of these efforts focused on the coronavirus’s distinctive spike protein, an obvious candidate for medicines since its structure and function were similar to that of other coronaviruses and well understood.

“In the future, the breakthrough is likely to speed the development of new medicines for everything from malaria to cancer. But AlphaFold 2 is already having an impact on the fight against today’s most pressing global health scourge: the COVID-19 pandemic.”

As well as the ability to save time, AI processes and compares massive amounts of information. This information can identify patterns that might otherwise go undiscovered. 

As this Labiotech article points out:

“AI also gives researchers the power to analyze disparate datasets. For example, it can combine vast libraries of chemical compounds, biomedical data from the literature, and patient health data into knowledge graphs. This data model creates new connections and insights into previously unrelated information, which researchers can use to make predictions, model novel pathways and disease states, and test their findings.”

Of course, as with anything, this tech has its limits. For scientists to trust a decision, they need to know why that decision was made. As George Paliouras, Senior Researcher at the Institute of Informatics and Telecommunications at NCSR Demokritos in Greece, who studies these sorts of AI issues explains:

“For example, AI might reveal a patient prone to Alzheimer’s, but why is this? What data has the system used to base this decision on? This is very, very important for all health applications of AI. So right now, a hot field of AI is known as ‘explainable AI’, or AI with decisions understandable by humans.”

We’ve only begun to comprehend what the development of AlphaFold and other similar platforms means for our future. 

While many questions remain, it looks like the understanding of how proteins work will be a cornerstone of how we tackle the biggest issues we face as a planet — from establishing why disease happens and how to treat it, to creating enzymes that can negate the toxic threat plastic waste poses to the environment. 

It’s a hugely positive example of how AI can improve our quality of life, and one I can happily get on board with.


bottom of page