Hacker News new | past | comments | ask | show | jobs | submit login
Predicting Properties of Molecules with Machine Learning (googleblog.com)
105 points by happy-go-lucky on April 8, 2017 | hide | past | favorite | 25 comments



If you're interested in this field check out this talk by Bharath Ramasundar from Prof. Pande's lab at Stanford: https://youtube.com/watch?v=sntikyFI8s8. He's also the author of http://deepchem.io - a deep learning library for drug discovery.


why I am NOT bullish on this solving molecular medicine discovery problems:

medchem is notoriously NOT generalizable. A crude example is the reason why the developed heroin is because in the early days of medchem, the reasoning was acetyl-salicylic acid is awesomer than salicylic acid, so therefore acetyl-morphine must be awesomer than morphine. Actually, in many ways it is awesomer (and that's why it's a bad drug).

Consider Gleevec. Even if you knew the structure of gleevec's target (BCR/ABL) you would not be able to predict Gleevec, because it works by displacing an entire segment of the protein out of place which happens to be thermodynamically more stable (but kinetically disfavorable). Gleevec is a medchem drug (discovered through combinatorial synthesis) but sadly the insight into this mechanism is only generalizable in the conceptual sense, if you take that molecular fragment and graft it onto another molecule intended for a different target, it probably won't work.

Deep learning depends strongly on generalizable knowledge, and medchem is notoriously not easily generalizable for well-understood reasons.

Some aspects of medchem - like optimizing bulk synthesis reactions, picking synthetic routes, guessing at bioavailability, stability in formulations, might be amenable to ML, but I am not bullish on discovery. Let's hope I'm wrong.


I used to work in medicinal chemistry and agree 100%.

It's not to say ML has no value, but predicting molecular behavior, even in the simplest system is really dam hard.

Wheb you only under 10% of the factors influencing behavior, ML doesn't get you far.


That means they need 90% more data, right? Like, how other molecules behave in context, not stand alone. Some of them have similar properties.


That's true, but we don't even know what to look for to find that remaining 90% of data!

I remember working with some computational scientists:"just put a methyl group on this nitrogen and we should increase binding by 100x!".

So we make the molecule, give it to the biologists and find out binding is actually 1000x worse!


I'd put even odds on some sort of ML replacement for Lipinski's rule of five. But there are also whole classes of exceptions, like strategically methylated cyclic peptides.


Amazing - worth looking at Muggletons publications for work using logical models rather than deep models

http://www.doc.ic.ac.uk/~shm/mypubs.html


> One reason molecular data is so interesting from a machine learning standpoint is that one natural representation of a molecule is as a graph with atoms as nodes and bonds as edges. Models that can leverage inherent symmetries in data will tend to generalize better — part of the success of convolutional neural networks on images is due to their ability to incorporate our prior knowledge about the invariances of image data (e.g. a picture of a dog shifted to the left is still a picture of a dog). Invariance to graph symmetries is a particularly desirable property for machine learning models that operate on graph data, and there has been a lot of interesting research in this area as well


Google has a lot of ML experts. Lot of fields can potentially benefit from ML and also contribute data and concept to ML but these fields don't have ML experts. I was thinking about this only a few days ago. I am so glad google is looking to personally contribute to every field possible.

Though property predicting is a hard problem,I think there are low hanging fruits in other fields. For example, Anthropology where only partial skeletons are found but we know there is symmetry there. Software regeneration is slow and expensive and doesn't exploit the symmetry a lot.

A joint project between google and CERN also sounds really cool to me. Or maybe google can set up a system where researchers with large data can approach google and see if a symbiotic relationship can be formed.


Yes,CERN which does work using advanced mathematics every day doesn't know how to use multivariate calculus. /s

You do know that CERN, regularly publishes papers in machine learning , right?


A personal anecdote to support your point: The first company I worked for in Europe was a well-established ML company that had been doing predictive analytics long (10+ years) before the current fad.

Founded by a former professor from CERN, and staffed about 90% from CERN postdocs. I was the only member of my team who was not a co-author on the Higgs boson discovery paper.

So yeah, people at CERN are pretty well aware of what can be done with ML.


I never said CERN doesn't use ML. But I would imagine google has more ML experts and computational capacity then CERN. Correct me if i'm wrong. There is nothing wrong with collaboration.


Materials science is buzzing in this respect from nanoscale to microstructural level but ML cannot predict the existence and, in case, the stability of the novel items you find. It is more a targeted, extrapolated attempt (because of some exact properties we wish to improve for targeted applications) than magic 8 ball.


No, but you can use ML to improve the accuracy of density functionals, even those targeted towards prediction of specific properties.


Can you? People already tested a big number of sub-exponential algorithms without any luck. If it was something easy to interpolate, one'd expect somebody to have some amount of success already.


I'd like to know what the Dr. Prof. Vijay S. Pande, father of the Folding at Home project, thinks about that.


Probably rolling out of laughter. You don't need ML to "predict" properties of molecules. Not a single physicist will but ML predicted molecule properties.


I wonder how long it will be before state-of-the-art SAT solvers start incorporating similar networks.


Having tried this myself, so far it doesn't work.

The main problem is that SAT problems come in many different sizes, and while resampling a picture "makes sense" to make it smaller, resampling a SAT problem does not. Also the general lack of shape or structure makes things hard.

NN can help bad SAT solvers get better, but the heuristics in the best ones are (at present) better than anything a NN can produce.


I doubt anytime soon (at least as part of current SAT solvers). A big barrier to applying ML to the process of SAT solving is that a lot of times, it's just faster to do a search with a simple heuristic for variable selection than try a much more time consuming ML method (and neural networks will be quite time consuming relative to the kinds of heuristics usually used) to do variable selection better.

Quantum is a special case really as DFT computation is already very time consuming.


Iirc graph ConvNets have been successfully applied to some Boolean Sat problems.


Only ones I've seen try and predict the satisfiability directly. But usually in SAT, you're either interested in a solution or a proof of infeasibility. Prediction can't do the latter and afaik, existing non-ML based SAT solvers are far better at doing the former.


I'm taking an AI class and decided to choose computer aided drug discovery as my research topic. So it's pretty cool to see stuff like this come out.


Yet another Arxiv "paper"...


Can't wait until we can combine quantum computers with machine learning. It should lead to a revolution in material science and medicine.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: