The Great Hack — Inaugural London Alternative Data Hackathon in Finance

DataScrum
DataScrum
Aug 31 · 3 min read
All the world alt data hackers unite!

Overview

DataScrum will be holding the first-ever alternative data in finance hackathon in London on the 14th of September at Microsoft Reactor in Shoreditch.

Please see the Hackathon Eventbrite page for thelocation and schedule details.

Organisation and Timeline

Participants will be organised into teams via DataScrum Slack which will also be the main channel of communication. We will also hold a conference call, AMA (ask me anything) type of session to answer all the organisational questions and write up some FAQs.

We propose the following three stages for preparing the coding assignments:

  • Stage 1 (first week, 1st to 8th of Sep) — data exploration and model building
  • Stage 2 (2nd week, 9th to 13th of Sep) — model testing and goodness of fit testing
  • Hackathon Day (14th of Sep) — model optimisation, fine tuning, back testing and presentation

Grading and Success Criteria

The the solutions will be judged by (i) model predictability / goodness of fit (eg Rsquared, AUC) and/or (ii) PnL / % return performance based on both backtesting and out-of-sample tests, (iii) innovativeness of the approach, data sets and models being used

Judges

The Hackathon judges will be drawn from past speakers and panelists of DataScrum alternative data events and include both buy-side representatives as well as data providers.

Data Sets

Distribution — the data sets will be distributed 2 weeks in advance via by invitation only Github repository and Google Colab / Google Drive . The details of the data sets are distributed on Slack.

Data Set Content

All data set content will fall into one of the following main categories:

  • Financial data — this include both company or macro price, fundamental (revenue / profits), and analyst consensus data
  • Alternative data — this will include all non-financial data from both the participating providers as well as from the public data sources, such as web traffic, geolocation, unemployment and weather data

Trading Problems — High Level Description

Sample trading problems and strategies will include:

  • Equities (low frequency) revenues vs periodic (quarterly / semi-annual) earnings announcements vs any above category
  • Equities (higher frequency)— revenues vs intra-announcement consensus vs any above category
  • Macro — commodities vs weather / satellite, fixed income indices vs employment, for example
  • Custom / BYOS (bring your own strategy)— any other problems participants suggest or they choose on their own

Participating Data Providers

The current participating data provider partner organisations include:

  • Refinitiv — meeting to be held on the 8th of August to discuss details
  • Locationsciences.ai — will provide mobile geolocation data
  • Mscience — will provide alternative data for a select set of tickers / companies
  • SimilarWeb — will provide alternative web traffic data for a select set of tickers / companies
  • MatchDeck — will provide knowledge graph data on company and people entitities and relationships
  • Quandl (Nasdaq) — one the leading marketplaces for all categories of alternative data
  • DataScrum — will provide limited financial and non-financial alternative data from public and proprietary data sources

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade