John Paparrizos Receives NetApp Fellowship to Enable Large-Scale IoT Data Analytics

Over the last decade, the bottleneck for data analytics has shifted from the collection of information to the analysis of increasingly massive and unwieldy datasets. The gap is only growing as the Internet of Things brings online more devices capable of relentless data collection, from smart electric meters to cheap home sensors. Methods designed for analyzing or comparing a format as straightforward as time series data become untenable when applied to thousands or millions of time series, forcing researchers to work with compressed or reduced data.

With a new fellowship from data services company NetApp, UChicago CS postdoctoral researcher John Paparrizos hopes to reduce this compromise, with new approaches that enable multi-faceted analysis on compressed, large-scale data. By making it possible to run clustering, classification, prediction, and other analytic tasks on data while it is still compressed, these approaches can help researchers avoid the headaches of working with raw, overabundant data without sacrificing the fine detail of observations.

Specifically, Paparrizos — a researcher in the laboratory of Liew Family Chair of Computer Science Michael J. Franklin — will build a unified approach to support several different analytic tasks on compressed data: indexing, classification, clustering, sampling, and visualization. Previously, papers have largely focused on specialized approaches that handle one task at a time and for a particular dataset in mind, making it hard for users to generalize these approaches in different settings and applications.

“For example, when an algorithm requires the use of a particular distance measure to compare time series, you have limitations on what kind of compression method you can use and, therefore, what kind of indexing mechanism you can use to accelerate the computation,” Paparrizos said. “In this project, our goal is to automatically learn to effectively compress time series such that the low-dimensional data are compatible with classic, well-studied, indexing mechanisms and, importantly, preserve the invariance to time-series distortions offered by user-defined comparison methods in the high-dimensional space.”

The project will evaluate the effectiveness of that approach on datasets from two real-world applications — high-resolution energy usage information collected by utility companies from smart meters and image data from satellites capturing Earth’s surface over time. Currently, researchers often need to reduce the dimensionality of these datasets in order to conduct comparisons and other analyses, losing accuracy in the process.

“Most of the highly accurate algorithms are very difficult to scale when you have databases with more than 100,000 time series, so for millions of time series, you need to find better ways to compress the data in order to offer a scalable solution,” Paparrizos said. “The challenge is to demonstrate minimal loss in accuracy while performing analytics on large-scale time-series collections.”

After development and testing, Paparrizos will then work to integrate the new methods into popular large-scale analytics software, such as Apache Spark. The NetApp fellowship provides funding for one year of work on the project. To read more about Paparrizos’ fellowship, visit the NetApp website.

Resources

Community

Non-Unital Noise Adds a New Wrinkle to the Quantum Supremacy Debate

The Science of Computer Security: An Interview with Grant Ho, Assistant Professor in Computer Science

Four Students Receive Honorable Mention in CRA Undergraduate Research Awards

Moon Duchin (Tufts University) – Design for Democracy

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

Postdoc Alum John Paparrizos Named ICDE Rising Star

UChicago and NYU Research Team Finds Edtech Tools Could Pose Privacy Risks For Students

Student Spotlight: Gabi Garcia’s Bridge Between CS and Classics

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups

High School Students Find Their Place in Computing Through Wearables Workshop

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022