While data science has grown in popularity over the last several years, it remains a very young discipline. By applying the combined strengths of computer science and statistics to data-rich problems in a variety of fields, data science has shown promise in everything from high-energy physics to medicine to public policy. But these early successes are just the tip of the iceberg, and a deep pool of challenges remain where data science can still facilitate breakthrough discoveries.
That room to grow was on display at the first Center for Data and Computing (CDAC) event, a networking lunch and preview of the first round of seed funding from the new University of Chicago initiative. Designed as an incubator for multidisciplinary data science research at UChicago, CDAC offers support for new ideas that fuse fundamental and applied research to real-world problems. At the mid-February kickoff, UChicago faculty heard about the current round of CDAC funding opportunities as well as concise overviews of the pressing data science challenges in urban studies, medicine, and computer science.
[Got a high-impact research idea for CDAC? Apply by February 28th for our first round of Data Science Discovery Grants.]
Luis Bettencourt, Professor of Ecology and Evolution and Director of the Mansueto Institute for Urban Innovation, toured the audience through the rich new data now available for studying cities, including global-scale maps of buildings, pollution, greenery, and human mobility. As these deep reserves of information grow in size, scientists are starting to explore new approaches for understanding urban life, investigating the connections between infrastructure, climate, socioeconomics, and human behavior.
“Whoever starts doing that well will be the leader of future social sciences,” Bettencourt said. “These are amazing things that require tremendous amounts of data, and you cannot do this work well with a single PI model. You need to bring together groups of people with many different areas of expertise, you need to have data resources, and you need to work with businesses.”
Similar challenges face the field of medicine, said speaker Samuel Volchenboum, Associate Professor of Pediatrics and Director of the Center for Research Informatics. While electronic health records and clinical data warehouses have become commonplace, much healthcare data is still collected or entered by hand, creating issues of messiness and compatibility that handicap innovation. Medical researchers need to work with data scientists on how to fix these issues and get the most value from that information, such as building predictive models for physicians that are both accurate and interpretable, he said.
“We have the data, of course, it’s just very difficult to learn how to make these connections, that’s where we definitely could collaborate together,” Volchenboum said. “Someday a computer’s going to tell somebody to try some drug or chemotherapy that nobody’s ever thought of, and somebody’s going to have to take a leap to say whether we should do this or not, and explain that to their patient. So the interpretability of these algorithms is very important.”
Underpinning all of this research is a need for new data science methods and foundations that can be applied across domains. Rebecca Willett, Professor of Computer Science and Statistics, explained how deep learning has shown great promise in computer vision and other areas, but that many essential questions remain about when it works and when it doesn’t. Before these systems can be utilized more broadly, more research must be done to understand how they handle complex data, how they can leverage physical models or human expertise, and how to preserve and build fairness into the machine learning applications increasingly used in modern society.
“Can we somehow create predictors that are blind to certain kinds of factors like gender and race? It can pretty hard to look at an algorithm such as a deep neural network, and determine whether we somehow complicitly creating biased predictors,” Willett said. “These foundational questions in machine learning nicely complement the applied questions.”
Following the talks, a networking and brainstorming session allowed UChicago faculty and students to propose projects and find collaborators. Researchers from across the university’s divisions and units suggested ideas where data and computation could be used to improve healthcare in Africa, reveal new physics discoveries, accurately predict stroke risk, study interactions between police officers and the public, and many more innovative applications.
[Image: Spatial mixing in Chicago as determined by cell phone data. Visualization by Jamie Saxon.]