It may not sound like the most exciting role, but the humble “scheduler” holds the key to the future of large-scale computing. Supercomputers, data centers, and even modern personal computers all benefit today from the concept of parallel computing, running multiple tasks at the same time instead of sequentially. This produces tremendously faster computing, but also dramatically scales up the complexity of deciding the optimal order to run as many as millions of interdependent jobs.
For decades, schedulers got by with a “greedy” algorithm — a short-term strategy which simply grabs the next task in line whenever resources become available. But as high-performance computing and data center operators worry more about energy costs and the end of Moore’s Law, this simple solution may no longer suffice.
At the 2018 Supercomputing conference in Dallas, a team of University of Chicago researchers presented an alternate approach to scheduling known as “Divide and Conquer.” The algorithm, developed by graduate students Gokalp Demirci, Ivana Marincic, and David Kim with associate professor of computer science Henry Hoffmann, applies a longer-term perspective and exploits configurable resources to achieve better results while adhering to a strict cap on energy consumption.
The algorithm is the first improvement on an approximation for scheduling with resource and precedence constraints since 1975, Hoffmann said.
“I think that most systems people figured these greedy algorithms work, and it's close to the best thing we can find in most situations, so I’ll just keep using greedy schedulers. That's why nobody's looked at it for 43 years, but now is really the time to solve it,” Hoffmann said. “The combination of power management and exascale means this is the right time.”
A More Efficient Exascale
As high-performance computing reaches for the exascale — systems that can run one billion billion calculations per second — experts emphasize that merely building bigger computers is no longer the solution. If today’s petascale machines were simply scaled up a thousandfold, they would consume 200 megawatts of power, as much as roughly 130,000 residential homes. The Department of Energy has proposed capping the energy consumption of exascale systems at 20-40 megawatts, presenting a difficult engineering challenge for both hardware and software.
Improving schedulers could be low-hanging fruit to help meet these goals. While full optimization is prohibitively complex — calculating the best schedule for a supercomputer would require a second supercomputer, Hoffmann said — there’s still plenty of room to improve beyond the simple, greedy algorithm.
Further help comes from the improved ability to fine-tune how a large computing system delegates power. If given a very large task and a very small task that can run concurrently, more resources can be directed to the large task so that the two finish at roughly the same time, freeing up more space for subsequent jobs.
That last idea is exploited by the “Divide and Conquer” algorithm, which looks to the future to put the emphasis on when tasks will end, instead of just starting what’s available whenever resources are free. For a typical computing job, the algorithm looks at the full workload, commonly depicted with nodes and edges as a directed acyclic graph (DAG), and continually, recursively divides the problem into subproblems. Those subproblems can then be organized to run concurrently where possible, and assigned different amounts of resources so that they finish together, leaving more resources available to start the next group of subproblems.
When tested against greedy approaches on DAGs of up to 10,000 nodes in a simulated supercomputer, the Divide and Conquer approach improved performance by as much as 75 percent. That’s more than just a footrace, as the faster performance was achieved using the same amount of power, suggesting significant gains in energy efficiency.
“It's neat because every scheduler of practical use was greedy, and this one is not. And we get a big win by not being greedy,” Hoffmann said. “Divide and Conquer will be harder to implement, so that's going to be an issue, but our first results are really promising and indicate this is actually going to be very valuable in practice as well.”
A Bridge Between Theory and Architecture
The advance was made possible by a productive collaboration across research areas within UChicago CS. Demirci and Kim — graduate students studying theoretical computer science with Professors Janos Simon and Laszlo Babai, respectively — started working on the problem in Hoffmann’s Computer Architecture class, partnering with Hoffmann’s systems computer science student Marincic.
“I always work with theory and machine learning students to figure out how we can relate what they’re doing to computer architecture,” Hoffmann said. “We said, this is what the algorithms are doing, this is what we're doing in systems, and there's a gap, so why don't we see if we can bridge that gap?”
“At Chicago, we have this historically extremely strong theory department and our systems are strong, but newer. I thought this was a great project that brings the old strength and the new strength together.”
The group’s first paper, “Approximation Algorithms for Scheduling with Resource and Precedence Constraints” was presented at the Symposium on Theoretical Aspects of Computer Science (STACS) in March 2018. The Supercomputing paper, “A Divide and Conquer Algorithm for DAG Scheduling under Power Constraints,” was presented Wednesday, November 14th.