UChicago CS News•Aug 22, 2019

Researcher John Paparrizos Wins ACM SIGKDD Dissertation Distinction

While time series are a very common format for data from scientific collaborations, the internet, and sensors, researchers still lack the standards needed to automate their analysis. As a PhD student at Columbia University and a postdoctoral researcher at the University of Chicago, John Paparrizos has worked to address this challenge. At this month’s 2019 ACM SIGKDD conference in Alaska, Paparrizos’ thesis, “Fast, Scalable, and Accurate Algorithms for Time-Series Analysis,” received an Honorable Mention for KDD’s Doctoral Dissertation Award.

With more and more data pouring in from scientific collaborations, the internet, and sensored environments and machines, new systems and algorithms are needed to make sense of all that information. Many of these data streams take the form of time series, with values collected sequentially over periods of time, such as hourly weather data or stock market prices. But while time series are a very common format, researchers still lack the standards needed to automate their analysis.

As a PhD student at Columbia University and a postdoctoral researcher at the University of Chicago, John Paparrizos has worked to address this challenge. At this month’s 2019 ACM SIGKDD conference in Alaska, Paparrizos’ thesis, “Fast, Scalable, and Accurate Algorithms for Time-Series Analysis,” received an Honorable Mention for KDD’s Doctoral Dissertation Award.

Paparrizos’ dissertation describes a new set of algorithms and automated methods for analyzing time-series data, regardless of their domain.

“The good thing is that currently we have the technological maturity to collect and store this data,” Paparrizos said. “We have different types of sensors for collecting data from natural processes and human-made artifacts, we have the computational infrastructure to store them, and we have large-scale dataflow systems to process them. But the fact is all of these systems, as well as most of the methods they support, have been designed for essentially static data. With the rapid growth of Internet-of-Things data volumes, we need to support applications for data that evolve over time.”

Typically, researchers analyzing time series need to do the same set of analytic tasks as in other domains, such as similarity search, classification, and clustering. But due to several challenges, such as the broad ranges of domains that generate time series and the high-dimensionality of datasets that can have millions of time points, the representations required for these analyses are usually created from scratch, one project or application at a time.

“What we were saying was, can we do something better than that? Can we essentially automate the process of constructing representations that preserve crucial characteristics to support time-series analytics?,” Paparrizos said. “It’s not sustainable to have Ph.D. students working for five years in order to achieve these things again and again.”

Experiments in Paparrizos’ dissertation showed that the proposed methods achieve state-of-the-art performance on over 80+ different time-series datasets, though much more efficiently than prior work. That’s useful not only for saving scientists’ time in the future, but also for developing analytic systems capable of running on limited computational resources, which will be critical for the next wave of Internet of Things and edge computing applications.

The thesis also describes methods for two new scientific contexts. In one, Paparrizos helped create a model that predicts which scientific concepts will have long-term impact, to help guide the decisions of funding agencies. Another project created a system that detects when people search for symptoms that may be predictive of serious diseases such as pancreatic cancer, which could trigger warnings to seek medical testing.

At UChicago, Paparrizos continues his thesis work by integrating the methods into databases, so that users can perform their analyses without moving these large datasets to external software. He’s also expanding his work for multivariate time series and to exploit alternative approaches, such as neural networks. Last year, he received a fellowship from data services company NetApp to create new methods that enable the analysis of compressed, large-scale data.

“Companies and scientists are now measuring multiple things at the same time, and they want to perform analysis over multiple different sensors, which will require significant changes in current approaches,” Paparrizos said.

Resources

Community

Two UChicago CS Students Awarded NSF Graduate Research Fellowship

Non-Unital Noise Adds a New Wrinkle to the Quantum Supremacy Debate

The Science of Computer Security: An Interview with Grant Ho, Assistant Professor in Computer Science

Moon Duchin (Tufts University) – Design for Democracy

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

Five UChicago CS students named to Siebel Scholars Class of 2024

UChicago Computer Scientists Bring in Generative Neural Networks to Stop Real-Time Video From Lagging

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

Postdoc Alum John Paparrizos Named ICDE Rising Star

UChicago and NYU Research Team Finds Edtech Tools Could Pose Privacy Risks For Students

Student Spotlight: Gabi Garcia’s Bridge Between CS and Classics

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups