It’s a rite of passage for every organic chemistry student: learning the difficult, laborious process of decoding spectroscopy data. For over 150 years, scientists and students have squinted at the peaks and valleys produced by these techniques in order to determine the molecular structure of a mystery sample. And in the computer age, many researchers have attempted to automate this “molecular fingerprint” analysis, with only limited success.
But recent advances in machine learning, simulation, graph theory, and other computational approaches may have finally paved the way for automation of this chemistry lab linchpin. In a paper presented at the 2019 NeurIPS meeting, UChicago CS Assistant Professor Eric Jonas described a new technique for reading nuclear magnetic resonance (NMR) spectra, opening up new possibilities for chemical analysis and the design of new molecules using a “self-driving spectrometer.”
“I think in the future, the ability to hand-read spectra will be much less important,” Jonas said. “It's such a natural fit for machine learning and such an important area for science.”
Jonas’ paper, “Deep imitation learning for molecular inverse problems,” tackles what’s typically a one-way street between molecules and their spectra. If you know a given molecule’s structure, scientists (or computers) can very accurately predict the spectroscopy results a measurement will produce. But the reverse situation, where a scientist is handed spectroscopy data and asked to deduce the original structure, is far more difficult and time-consuming.
“When you make a spectroscopic measurement of a molecule, it's going to tell you lots of things about, say, the bonds in that molecule or the local electronic environment of a particular nucleus,” Jonas said. “You get a lot of these different pieces of information, but putting those pieces of information back together is of course the hard part. That’s what chemists spend a lot of their day doing.”
But those same frustrations make reading spectroscopy data a natural candidate for machine learning. In the same way that an image classification neural network learns its own “rules” for determining whether a photograph is a dog or a cat, a similar model could figure out how to interpret the features of a spectra and make an educated guess about the structure it encodes. The problem, as with many machine learning applications, is finding enough data to train the model; for nuclear magnetic resonance spectroscopy, accurately measuring just one sample can take minutes to hours.