Imagine you are hiking on a mountain when a dense fog blows in. Visibility quickly drops near zero, and you can’t see more than a step in front of you in any direction. To return to your car in the valley, you follow a simple strategy, moving whichever direction appears to lead most downhill, and hoping that path will eventually reach the bottom.
In mathematics, this process is known as gradient descent, and it’s the secret sauce behind most of the machine learning applications we encounter today. Instead of finding the lowest altitude, gradient descent helps machine learning models describe a dataset with the smallest error, helping it make more accurate predictions on new data. But as the applications of machine learning grow more complex and the datasets grow larger, the simplest forms of gradient descent may not be powerful enough.
Lorenzo Orecchia, an assistant professor of computer science who joined UChicago CS this fall, creates and implements new algorithms drawing upon first order methods, the mathematical family that includes gradient descent and similar approaches. His work draws upon ideas from both discrete and combinatorial optimization, as well as ideas inspired by natural processes, such as the dissipation of heat and the behavior of electricity, to create more powerful algorithms for the future of artificial intelligence, distributed computing, logistics, and other advanced computational tasks.
“Many of the most impactful problems that you can work on at the intersection of applied mathematics and computer science live in machine learning,” Orecchia said. “In the beginning, the algorithms we would run on machines were inspired by the algorithms we ran on our own pencil and paper. But these days, the kinds of algorithms that we run for machine learning are more complicated; you could say they're inspired by the way nature works around us, they simulate continuous processes that converge to some minima.”
Because nature — and the kinds of problems humans want to solve with artificial intelligence — are far more complex than the one mountain/one valley example above, more advanced strategies than gradient descent are necessary. Consider a mountain range with thousands of peaks and valleys, or the ability to move in many more than three dimensions, and you get some sense of the complexity involved. Other methods might also be faster and more efficient than gradient descent, even for simple systems, taking bigger steps and reaching the “valley” in fewer moves.
In previous spells at Boston University, MIT, and UC-Berkeley, Orecchia developed new algorithms for tackling these more challenging conditions. In one project, he helped find a new solution to the “max flow” problem of determining the most efficient path across massive, million- or billion-edged networks such as the Internet, genetic interactions, or the U.S. highway system. To crack it, the researchers were inspired by imagining how electricity flows across a system of resistors, developing a technique that probes all possible paths simultaneously, locates bottlenecks, and determines the fastest option. The work received the Best Paper award at the 2014 ACM-SIAM Symposium on Discrete Algorithms.
More recently, Orecchia’s projects have searched for the best way to partition massive, interconnected data sets for analysis using multiple computers simultaneously, and optimization solutions that could help businesses looking to solve large-scale logistic problems while minimizing cost. At UChicago, he will straddle the theory and machine learning groups, finding new algorithms to strengthen and expand the use of neural networks in machine learning applications.
“It's really the algorithm that you use to train the neural network that is important in making the results; your networks don't work if you don't use the right algorithms to train them, ” Orecchia said. “So understanding what these algorithms do is very important for understanding what neural networks do, and there is a lot of work in my group trying to understand and improve the training of neural networks through continuous dynamics and through first order methods.”
Orecchia jokes that he has wanted to work at the University of Chicago since he was growing up in Italy, since it was the homebase of famed physicist (and fellow Italian) Enrico Fermi. Today, UChicago appealed to him based on its strengths in theory and the fundamentals of computer science.
“It's a purely research university, where there is a strong emphasis on basic research,” Orecchia said. “A lot of my research is on the basic side, and I feel like it's appreciated in a place like this.”