CDS Hosts 10th Data Science Showcase

Four statistics researchers present overviews of their projects to the Moore-Sloan community, CDS affiliated students and professors, and other data science researchers

This week, CDS hosted the Moore-Sloan Data Science Environment’s 10th Data Science Showcase. Organized around the theme of statistics, the showcase featured short presentations from the following speakers:

  • Xi Chen, CDS Affiliated Faculty, NYU Stern, Assistant Professor of Information, Operations and Management Sciences
  • Jennifer Hill, CDS Faculty, NYU Steinhardt, Professor of Applied Statistics and Data Science
  • Mengling Liu, CDS Affiliated Faculty, NYU Medical School, Associate Professor at the Department of Population Health and Department of Environmental Sciences
  • David Madigan, Executive Vice President and Dean of the Faculty of Arts & Sciences and Chair of the Statistics Department, Columbia University
Xi Chen at the 10th Moore Sloan Data Science Showcase (2017)

Professor Chen first introduced the IOMS department in Stern and the research area of each professor in the IOMS statistics group. Professor Chen’s research intersects statistics, machine learning, and operations research. In his presentation, “Statistics Meets Big Data,” he explained how his research focuses on making statistical inferences for modern big data applications, such as real-time streaming data, large-scale data stored in a distributed environment, and high dimensional structured data. In addition to statistical learning and inference, he highlighted the importance of decision-making for business analytics. Professor Chen’s research on sequential analysis combines online learning and decision-making in a unified framework. Applications for his research include sequential learning for crowdsourcing and dynamic assortment planning. The ultimate goal of his research is to provide decision makers with tools to help them make wise choices that will save costs.

Jennifer Hill at the 10th Moore Sloan Data Science Showcase (2017)

Professor Hill provided an overview of the Center for the Promotion of Research Involving Statistical Methodology (PRIISM), of which she is a co-director. She highlighted PRIISM’s collaborative research which applies statistics and data science to the social, educational, behavioral, health, and policy sciences. Professor Hill emphasized the importance of understanding the full lifecycle of data starting from forming research partnerships and thinking carefully about measurement and sampling all the way through to translating results to different communities (stakeholders, academic scholars, and the public). A current joint project of hers is Tracking Hope in Nairobi and Karachi (THINK), which aims to quantify how the attainment of educational goals affects hope, aspirations, and violence in Nairobi and Karachi.

Mengling Liu at the 10th Moore Sloan Data Science Showcase (2017)

Professor Liu discussed applications of neural networks in building cancer risk prediction models. She demonstrated that relative risk estimation (which is obtained from sampling cohort data) does not necessarily reflect absolute risk (which depends on population based data). She also clarified that classification outcomes are not necessarily equivalent to prediction outcomes, and that causal interpretations cannot be gleaned from retrospective studies without assumptions or external data. Additional applications included identifying FDNY firefighters with 9/11 lung related injury, classifying patients with or without glaucoma, identifying asthma patients who could benefit from repeated measurements of inflammation biomarkers, and building new breast cancer risk prediction models.

David Madigan at the 10th Moore Sloan Data Science Showcase (2017)

In Professor Madigan’s presentation, “A Data-Driven World: Opportunities and Challenges,” he explained how he and his colleagues from the Observational Health Data Sciences and Informatics (OHDSI) are using data science to improve medical diagnoses, build predictive models for patient treatment, and perform large-scale population level effect estimation by analyzing over 700 million electronic health records. Madigan also explained how OHDSI’s research teams are creating a new methodology to both improve the reproducibility of the observational medical studies, and reduce human and publication bias.

by Paul Oliver