I found it interesting that AlphaFold can't reliably predict the structures for mutations that disrupt structure. The explanation makes a lot of sense though.
It is sometimes important to remind oneself that the selection of protein structures that exist in nature and that we determined experimentally is biased. Nature doesn't like proteins that misfold because they can easily cause trouble. And proteins with less defined structures are generally harder to solve with the usual methods like X-ray crystallography. The list of protein structures we know isn't a representative sample of all possible protein structures, it's mostly structures that are useful in nature and that we can determine with the methods we have available.
>proteins with less defined structures are generally harder to solve with the usual methods like X-ray crystallography
What do you mean by 'proteins with less defined structures'? I'm not familiar with what this phrase could mean, could you please expand on this concept?
Less defined means flexible in this case. So either parts that are completely random on their own, or parts that can adopt multiple different structures.
There are also intrinsically disordered proteins that have no defined structure when they are on their own, that's essentially like a piece of string that is almost completely flexible. Those proteins can still adopt a specific well-defined structure if they bind to something else.
So does flexible mean that there may be different amino acids in a portion of the peptide? From my understanding, when flexibility is discussed in terms of proteins we're talking about rigid vs flexible side chains which can move or rotate along specific bonds
So for the intrinsically disordered ones, are you mainly talking about the secondary or tertiary structures? My assumption based on your statement is that we're keeping the same primary structure (order of amino acids) but they don't have many (if any at all) intermolecular interactions? Would it be safe to assume that you're referring to shorter polypeptide rather than large proteins?
in disordered proteins, there is no permanent tertiary structure. they may have some secondary structure, but the relations of those structural elements can change in time. It does not mean the seuqence has variation in it.
Does this mean that they have multiple conformational states with similar energies that are easy for it to transition between? How different are the states and is this how the protein normally does its proteiny stuff?
Yes, many proteins have transitional global arrangements that it traverses as it meets some goal. For example, kinesin and dynein walk along microtubules in a way where we could never perfectly characterize the intermediary states since it's effectively a motor with free rotation around certain elements.
A lot of crystallography is focused on enzymatic reactions where you bind a ligand that sits there for the sake of introducing some conformation that you can study. The ligands generally approximate the natural substrate at either the beginning, end, or some intermediate step in enzyme catalyzed synthesis.
yes, I would say that intrisnically disordered proteins adopt something like an unfolded state, which is to say that they can visit a wide range of structures that are at similar energy levels, all of which are accessible at ~room temp. I can't really answer in more detail because all the ID proteins are fairly different an dhow they do their job is hard to understand compared to stable static "rocks" like enzyymes.
Enzymes aren't stable and static -- usually in their active site they have significant conformational changes that enable catalysis of the relevant chemical reaction. It's quite a problem that we don't have general robust ways of directly elucidating those transient structures, a lot of our understanding of catalysis is still held back or slow-evolving because we can only use indirect and cumbersome methods (like isotopic mutation + laser IR)
I would consider most enzymes to be intrinsically disordered at their active sites.
No enzymes are not intrinsically disordered at their active sites. They are highly ordered. Most enzymes don't undergo large changes- they accept a molecule, do their business, and release it. You're thinking of other proteins like motor proteins which under go large, controlled conformational changes.
The active site is structured to stabilize the transition state of the affected molecule and move it from one state to the next in the chemical reaction. That requires very specific shapes and correlated changes. But of course, this being biology, you can remove all 3 active site residues in a serine protease catalytic triad, and still see proteolysis because the protein, when it binds the substrate, forces the subtrate into its transition pathway.
People have been working on these things for quite some time- I saw talks about time-resolved crystallography of active sites, and while they say "significant structure changes", they really only mean localized breathing-like motions, not massive rotations of entire domains.
is it possible to identify which proteins are intrinsically disordered based on amino acid sequence alone (or even base sequence)?
put another way, is it possible to a priori determine if a protein is ID or ordered?
for instance, you said enzymes are highly ordered. is this based on experimental observations (which could later be wrong if imaging techniques improve) or is there some principle that allows us to treat this as a fact?
> is it possible to a priori determine if a protein is ID or ordered?
There’s software that attempts to predict intrinsic disorder based on sequence alone, but in general, in the absence of homolog (evolutionarily related) proteins with known structure you would still need to check experimentally for disorder.
EDIT:
> if the goal is to reliably assess certain viral proteins as ID or ordered, experimental methods are the only methods for achieving this?
If you don’t find homologs with solved structures, experimental characterization is the way to go.
thanks for the explanation. to clarify, if the goal is to reliably assess certain viral proteins as ID or ordered, are experimental methods the only methods for achieving this?
A priori? No. Typically this would be determined by synthesizing or expressing the protein of interest and then using something like CD (circular dichroism).
There is an absolutely enormous amount of experimental data about enzyme structure, but frankly I think the simplest is to just understand that the modern ideas about the reversible protein folding process came from ribonuclease, a protein that cuts RNA: https://en.wikipedia.org/wiki/Anfinsen%27s_dogma
There may also be intrinsically disorderd enzymes, I'm not really sure how they would work, but of course, in biology, there's always a weird example that violates normal expectations because evolution once randomly tried somethign a billion years ago and got stuck with it.
I didn't downvote you but going from a structure prediction (which you get for ordered proteins) to functionality is not straightforward at all. How would you do it?
You could predict functionality based on homology with evolutionary related sequences, but they would work equally well for all proteins (ordered or not).
to clarify, the statement isn't that it is straightforward -- simply that it's easier to analyze something that doesn't change shape unpredictably than something that does (fewer variables to consider).
whether predictions based on evolutionary related sequences are 100% accurate in human cells is another matter, especially when it comes to viruses.
The same protein can deform into multiple different 3D shapes, called conformations. Some proteins are rigid and exist almost exclusively in a single conformation. It is probably easier to determine the 3D structure of proteins with a single, dominant conformation. Other proteins don't have well defined conformations, and are more like a tangle of rope that can bend in many different ways
thanks for the explanation. what are the biggest factors influencing conformation? what are the best ways today for imaging proteins with different conformations, and what are the limitations of these methods?
Example by analogy:
Flat tire has less defined structure, and can take many shapes.
Inflated tire has more defined structure, and behaves more predictably.
Many proteins have intrinsically disordered regions that are hypothesized to be directly related to the protein's role in the cell. These regions are termed disordered because current methods used to determine the structure of proteins are unable to resolve a regular structure for these regions in the context of a protein crystal or protein in solution. This publication is an informative review on the topic: https://pubs.acs.org/doi/10.1021/cr400525m
> I found it interesting that AlphaFold can't reliably predict the structures for mutations that disrupt structure
It’s not that surprising given the conceptual background of the method.
Since it’s relying on evolutionarily coupled residues, AlphaFold is looking at sets of complementary mutations that keep or rescue a determined structure, i.e. the complete opposite of structural disruption.
> The list of protein structures we know isn't a representative sample of all possible protein structures
It's kind of surprising that AlphaFold has some success with random sequences of amino acids:
> "Baker’s team gets AlphaFold and RoseTTAFold to “hallucinate” new proteins. The researchers have altered the AI code so that, given random sequences of amino acids, the software will optimize them until they resemble something that the neural networks recognize as a protein. In December 2021, Baker and his colleagues reported expressing 129 of these hallucinated proteins in bacteria, and found that about one-fifth of them folded into something resembling their predicted shape."
20% is not that great but it has potential. One long-standing goal is the de novo design of protein-based industrial catalysts for specific chemical transformations. Proteins from bacteria that live in boiling sulfur vents etc. have been used to some extent, but the idea is that similar proteins could be designed for a much wider variety of industrial processes. As the article notes, specificity remains a challenge (and designed proteins don't approach the efficiency of the evolutionary selected proteins), but it still seems promising.
P.S. I'm a bit more skeptical about the drug-design programs. It's not so much that novel drugs can't be designed that bind to the desired targets, it's that they might bind to a whole lot of undesired targets as well, leading to nasty side effects. Now if you could screen against the whole proteome, perhaps.
Talking about higher level applications, BBC Science in Action [1] interviewed Prof John McGeehan of the Centre for Enzyme Innovation at Portsmouth University working on Bacteria breaking down Plastic in landfills. He explained his workflow of maybe selecting one candidate occasionally out of many due to the cost/time involved & how DeepMind gave him more results in one weekend that he had expected to see over his entire career.
Both measures can be quite similar. Most protein designs can be screened in parallel for solubility and successful designs can be further engineered and tested in a high-throughput manner.
> and designed proteins don't approach the efficiency of the evolutionary selected proteins
This sounds interesting. Can you talk more about this? Efficiency in what sense, energy efficiency in the actual protein assembly in biological systems, or efficiency in actual performance of the protein while it's functioning in a biological system?
I just wish people would stop using the word "fold" for this. It's not folding. It's just structure prediction. It's great at structure prediction (static prediction of a single structure) and not at all at the folding process (which is dynamic and rapidly changing).
“Protein fold” and “protein folding” are two different concepts. Folds are structural categories, folding is the biophysical process. But I agree that there are better words out there to name such a tool.
That's very misleading, as you can see. I believe we should not use the term
fold for structural categories as it's a misnaming. It's a historical accident that came about before people began to understand that folding is a process, not an on/off switch.
Even 'topology' is a little confusing to those more familiar with the term from maths.
For the 'CATH' hierarchical classification, the 'Topology' level is something like the organization of secondary structure in an 'Architecture'. This has some relationship to topology in the general sense, but is a narrower definition.
For me, the 'fold' is what happens after 'folding' occurs, but I take the point that it is confusing.
Topology actually makes some sense here in that a very small number of proteins do fold into knots! This was a huge surprise and completely contradicted most predictions. https://en.wikipedia.org/wiki/Knotted_protein
knots in the real world (actual, real knots) don't have tied-together ends either. That's not what a knot is. The study of topology in math works on abstract math-knots.
Generally, any non-covalent bonds aren't considered to be topological connections in proteins, although salt bridges can definitely ahve bond energies not far from covalent bonds.
I think this article does a good job of highlighting the difference between simulations and ML-based approaches. The latter are faster, but have limitations outside of their training parameters. As with everything in ML, broader training data to cover those cases probably helps. Though I would guess some of the problems could be inherent, that there fundamentally is no computational shortcut to this problem, whether you use a neural network or not.
This wasn't Google, it was DeepMind. Google doesn't get any credit for this. I tried to start this project at Google but it conflicted with the Google Health team's goals.
Even if it’s a sister project, it’s great PR for Google. I accept more ads from Google as it gives back so much in healthcare. I wish Meta would do the same, I wouldn’t care if it’s part of Facebook or not. GMail was something similar at the start: just do something good, to make more people like Google.
As for your own project I’m sorry for you: there are no more 20% projects, like in the old times :(
If one wanted to make a case for that, arguably it's DrepMind's major infrastructure resource consumption that would give Google some credit. I can certainly point to some colleagues in infrastructure engineering who've helped DeepMind folks deal with infrastructure problems.
That being said, I'm perfectly happy to just be proud of what DeepMind has achieved. They've been great to interact with when there were shared challenges, and I - and to my knowledge all colleagues around me - aren't very interested in a question of credit assignment.
Google's infrastructure and DeepMind's internal (not cloud) access to it has been absolutely critical. In many ways, DM is leading Google in terms of software development, while Google is leading in providing unique ML hardware capabilities.
amusingly, I work for a pharma and they don't even return our calls. I wonder how seriously they take this business, because if I was selling a product based on this, pharma would be my first customer.
Could be wrong but I think Deepmind sees more value in elite AI/ML talent that Alphafold will draw and help retain than future potential profits on drug discovery. Open sourcing Alphafold and removing commercial restrictions wouldn't make much sense if drug profits were their goal.
Sure. AlphaFold is, in fact, the greatest shot at revenue that DeepMind has shown so far (and they are under intense pressure from Alphabet to show revenue).
I don't think there is any pressure on that front. They are supposedly profitable now (though i'm guessing this is partially accounting tricks) but there just isn't a need to be profitable. Search and Youtube print money to fund their R&D ($31B last year alone). The goal is AGI or close to it.
The "profit" you're pointing at is money that Google pays DeepMind to do software and machine learning as a service for them. This pays off, for example with Jax, where nobody in Google Research could touch it because Jeff Dean/Tensorflow, until DM demonstrated (with alphafold) that Jax could do nobel-prize-winning research, to the point where Jeff has admitted that tensorflow has serious problems and systems like jax are the future (see the palm paper!!!)
It is sometimes important to remind oneself that the selection of protein structures that exist in nature and that we determined experimentally is biased. Nature doesn't like proteins that misfold because they can easily cause trouble. And proteins with less defined structures are generally harder to solve with the usual methods like X-ray crystallography. The list of protein structures we know isn't a representative sample of all possible protein structures, it's mostly structures that are useful in nature and that we can determine with the methods we have available.