On September 29–30, 2018, the Tow Center for Digital Journalism co-sponsored the fourth annual Columbia Data Science Hackathon in collaboration with the Columbia Data Science Society. The hackathon drew over 100 Columbia students who worked with novel datasets from the Tow Center and tech companies to create tools and insights.
Tow Center for Digital Journalism
Almost every day, the White House publishes a 4–6 stories under the “West Wing Reads” banner. This dataset brings those stories together, it includes the titles, publications, date of publication, as well as the entities mentioned within the stories, alongside other metadata.
Columbia Tech Ventures
The core of the dataset is a list of all inventions disclosed to Columbia Tech Ventures by inventors dating back to the 1980s. Many of these inventions were discovered in the course of grant-supported research described in academic publications.
Qu Capital provided two time series datasets — tick-level data for bitcoin on a major cryptocurrency exchange and a parsed corpus of Reddit comments from select subreddits.
Winning Teams & Projects
“Data Never Sleeps” team: Kedi Cui, Zhe Liu, Yang Song, Xiangtian Deng
Constructed a bitcoin trading algorithm by using an ensemble of machine learning algorithms — XGBoost, ARIMA, LSTM, and NLP — to predict market price.
“Black and White” team: Quan Yuan, Xiaowo Sun, Jie Li, Xiaofan Zhang
Implemented a combination of machine learning, NLP, and time series to do feature engineering, predictive modeling, and designing an arbitrage strategy.
“Sleep Beauty” team: Jinhao Zhang, Mingfeng Li, Yinan Ling, Nan You
Identified high-value inventions from patent data using feature selection and neural network.
Google Cloud Platform
“Knowledge Trumps All” team: Thompson Bliss, Jacob Klein, Alex Kim, Patrick Lewis
Created a web application that quantifies the differences in writing style between news publishers and predicts whether an article will be promoted by “West Wing Reads.”