2018 Columbia Data Science Hackathon

On September 29–30, 2018, the Tow Center for Digital Journalism co-sponsored the fourth annual Columbia Data Science Hackathon in collaboration with the Columbia Data Science Society. The hackathon drew over 100 Columbia students who worked with novel datasets from the Tow Center and tech companies to create tools and insights.


Hackathon Datasets

Tow Center for Digital Journalism

Almost every day, the White House publishes a 4–6 stories under the “West Wing Reads” banner. This dataset brings those stories together, it includes the titles, publications, date of publication, as well as the entities mentioned within the stories, alongside other metadata.

Columbia Tech Ventures

The core of the dataset is a list of all inventions disclosed to Columbia Tech Ventures by inventors dating back to the 1980s. Many of these inventions were discovered in the course of grant-supported research described in academic publications.

Qu Capital

Qu Capital provided two time series datasets — tick-level data for bitcoin on a major cryptocurrency exchange and a parsed corpus of Reddit comments from select subreddits.

Winning Teams & Projects

First Place

“Data Never Sleeps” team: Kedi Cui, Zhe Liu, Yang Song, Xiangtian Deng

Constructed a bitcoin trading algorithm by using an ensemble of machine learning algorithms — XGBoost, ARIMA, LSTM, and NLP — to predict market price.

Second Place

“Black and White” team: Quan Yuan, Xiaowo Sun, Jie Li, Xiaofan Zhang

Implemented a combination of machine learning, NLP, and time series to do feature engineering, predictive modeling, and designing an arbitrage strategy.

Third Place

“Sleep Beauty” team: Jinhao Zhang, Mingfeng Li, Yinan Ling, Nan You

Identified high-value inventions from patent data using feature selection and neural network.

Google Cloud Platform

“Knowledge Trumps All” team: Thompson Bliss, Jacob Klein, Alex Kim, Patrick Lewis

Created a web application that quantifies the differences in writing style between news publishers and predicts whether an article will be promoted by “West Wing Reads.”