Teaching a Computer to Be a Scientist
Using machine learning to calculate material structure and properties reduces computational time and cost
Andrea Bruck
How do you learn?
In grade school, learning how to add big numbers might have been a very daunting task. At some point, you probably got a stack of flash cards, took those cards home and practiced the arithmetic with your friends, siblings or parents. At first, you might have used your fingers and toes to help you add numbers and get the correct answer. When you believed you calculated the answer correctly, you looked at the back of the flash card to see whether or not you were right. After a lot of practice, you were eventually able to get your pile of flash cards (mostly) correct by creating shortcuts or remembering tricks to help you solve the problem. After that, adding big numbers was easy as you became more and more confident in your newly acquired skill!
Now, how can we teach a computer to learn new things using a similar method? More important, how can we use a computer that has been taught to solve a problem (has a trained neural network) for realworld applications, such as accurately calculating material properties, and pushing our current scientific understanding a step further?
This new way to “teach,” or train, a neural network is the basis of machine learning. Recently, Energy Frontier Research Center (EFRC) scientists at the Center for Next Generation of Materials Design (CNGMD) and the SolidState Solar Thermal Energy Conversion Center (S^{3}TEC) developed a publicly available code called PROPhet that allows you to train a neural network to predict material structureproperty relationships and chargedensity functionals faster and more costeffectively than ever before.
How is it possible to calculate material structures and properties? For a little bit of background — Nobel Prizewinning scientists John Pople and Walter Kohn, along with Pierre Hohenberg and Lu Sham, identified the underlying mathematical operations that describe how electrons are distributed in atoms. When they developed the mathematical expressions for the electronic density of an atom, they postulated that they could directly calculate any system property if the exchangecorrelation functional (how the atoms are held together) was known. The methods they developed calculate those structures by solving quantum equations using approximations that they referred to as functionals. Using their methods to calculate electronic density, modern computational scientists calculated density functionals that mathematically describe the electronic structure of materials and predict material properties by a process called density functional theory (DFT).
This development has allowed CNGMD and S^{3}TEC scientists to discover new materials and helps rationalize why we observe certain properties in some materials but not in others. Currently, they are using DFTbased methods to model large, complex systems such as interfaces and surfaces, but these attempts are limited. DFT relies on solving equations, and sometimes there are too many parameters to calculate or there is too much uncertainty in one calculation to accurately predict a property in a large system. Therefore, to expand computational techniques, researchers are coupling DFT with the newly developed machine learning code, PROPhet, to improve predictions for complex material systems.
Incorporating machine learning: How PROPhet works. Unlike the calculationbased DFT, the neural network used in PROPhet does not need to solve an equation every time it predicts a property. Neural networks are trained to solve a particular kind of problem, just like you when you used flash cards to learn to add numbers. You need to provide a large pile of flash cards (training set of data) to a neural network, and it figures out how to get the correct answer by minimizing the error for each flash card. Instead of solving an equation, it uses large arrays of numbers (or weights) that are optimized each time you get through the pile of cards (iteration). After the flash cards have been mastered — and the error for each card is minimized — you have an optimized array of numbers that provides an accurate prediction to the answer on the back of the card. You can think of this array of numbers like the tricks or shortcuts you used to learn with your flash cards.
For PROPhet, this means you provide a large set of materials and PROPhet's network learns the correct set of weights to predict an accurate electronic structure for each material by minimizing the error on the training set. After it learns those weights, it uses them on new materials without using any of the classical equations from DFT. This, in essence, is how PROPhet learns and shortcuts the cost of DFTonly methods.
Scientists at CNGMD and S^{3}TEC are beginning to lift computational limitations by enabling PROPhet to make faster calculations, handle more complex chemical systems and generate predictions that are more accurate.
Calculating sound waves in diamonds with very small imperfections. Like DFT, PROPhet can predict electronic properties and how sound waves travel in diamondphase carbon. However, PROPhet completed 2,000 predictions in seconds, orders of magnitude faster than DFT with similar accuracy. Current computational limits were stretched by calculating the behavior of sound waves in diamonds with very small imperfections (or low concentrations of vacancies). Incorporating such a small number of imperfections is not feasible with current DFT methods because it requires thousands of calculations with hundreds to thousands of atoms. However, PROPhet could learn the longrange interactions needed to accurately calculate the speed of sound in such large systems.
Solving the unsolved. DFT postulates that any system property could be calculated if the way a electrons move in an atom (the kineticenergy functional) can be mathematically described. However, this expression is very difficult to calculate. At best, DFT calculations approximate this functional to compensate for its enigmatic nature. Current DFT models typically have an error of approximately 450 meV. Through a new approach of learning chargedensity functionals directly through PROPhet, the CNGMD and S^{3}TEC scientists' first attempt at learning the functional form had an error of approximately 300 meV — a 33 percent decrease in error. This is an interesting pathway forward for computational scientists because it allows the direct learning of the functional relationship between charge density and any number of properties.
What else can PROPhet do? You get to decide. Along with diamonds and frustrating functionals, PROPhet has already been shown to predict physical properties with the same accuracy as DFTbased methods. As a final example, PROPhet learned the exact band gap (HOMOLUMO gap) for more than 5,000 structures of ammonia by mapping this property to inexpensive electronic structures. The band gap is a property that is most well known for giving molecules and materials their color. The direct calculation of material and molecular properties using fast calculations with high accuracy shows an optimistic outlook on PROPhet’s ability to save computational time and cost without compromising scientific validity.
Last, CNGMD and S^{3}TEC scientists have made PROPhet publicly available online, providing the opportunity for any researcher interested in exploring and expanding on PROPhet’s powerful machine learning techniques to utilize on their own specific computational questions. With a solid foundation, PROPhet has just begun to scratch the surface on the scientific questions it is able to answer and will hopefully continue to open new avenues for computational scientists to explore. Just as you once trained yourself to calculate numbers from flash cards, you are now able to teach a computer how to calculate material structure and its related properties.
PROPhet can be accessed at http://kolpak.mit.edu/PROPhet or viewed on GitHub.
Acknowledgments
The research was supported by the Center for Next Generation of Materials Design and the SolidState Solar Thermal Conversion Center, Energy Frontier Research Centers funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences.
More Information
Kolb B, LC Lentz, and AM Kolpak. 2017. “Discovering Charge Density Functionals and StructureProperty Relationships with PROPhet: A General Framework for Coupling Machine Learning and FirstPrinciples Methods.” Scientific Reports 7:1192. DOI: 10.1038/s4159801701251z
About the author(s):

Andrea M. Bruck is currently working toward a Ph.D. at Stony Brook University, Department of Chemistry. She is a young investigator in the Center for Mesoscale Transport Properties (m2M), an Energy Frontier Research Center. Her research focuses on the fundamental processes that occur in a battery during its operation and how synchrotronbased characterization can elucidate the chemical processes that cause battery failure.
More Information
Kolb B, LC Lentz, and AM Kolpak. 2017. “Discovering Charge Density Functionals and StructureProperty Relationships with PROPhet: A General Framework for Coupling Machine Learning and FirstPrinciples Methods.” Scientific Reports 7:1192. DOI: 10.1038/s4159801701251z