Using Kaggle as a proof-of-concept for your AI idea

Salim Virani
May 10, 2017 · 2 min read

Several founders at Source Summit AI found themselves with datasets which they believed they could monetize, but wondered how to get a model built around them without committing to full-time hires or expensive consultancies.

Kaggle came up as a potential solution — it’s an online platform which sets data science challenges with prize money attached. If you have the data, you post a challenge, reveal part of the data to be used for data scientists to train their models, and withhold part of the data for Kaggle to test the accuracy of their predictions.

Sea lion party!

In one currently open competition, the NOAA (National Oceanic and Atmospheric Administration, a US Government agency) is offering $5,000-$12,000 to the top three teams who create algorithms which can scan photographs of the ocean and count Sea Lions in order to quickly determine their current population numbers. These photographs have already been hand evaluated by biologists, so the estimates provided by the algorithms will be matched to those numbers to gauge their accuracy.

Another example is a competition run by Intel and MobileODT asking for algorithms to look at images of women’s cervixes and evaluate their type in order to improve cancer screening. Similarly to the above competition, the predicted results of the submissions will be compared to the actual results determined by doctors. Other competitions revolved around predicting prices in the Russian housing market and tagging videos.

Kaggle defines the process as follows:

Define: Identify a valuable machine learning problem that you have the data for.

Scope: Work with us to refine the problem statement and finalize the dataset.

Create: Collaborate with us as we build out your competition’s pages.

Launch: Engage on the forums, review shared code, and watch the models improve on the leaderboard.

Learn: Receive code and docs from the winners, and follow up with knowledge-transfer calls.

Looking at the range of currently open competitions, it’s possible to gather hundreds of submissions and determine which one analyzes your data best. Best of all, these submissions are developed in parallel, resulting in a much faster development process. Current prizes range from $25,000 to $100,000, which might be more affordable or faster than your other options.

Just note, however, that hosting a competition involves uploading a portion of your data publicly, so make sure you have the legal right to do this.

Kaggle competitions might fulfill your needs, but more likely they’ll provide you with a proof-of-concept for further investment. Then, you have a clearer challenge to share with freelancers, or a starting point for hiring a full-time team. Either way, hosting a Kaggle competition is one of the faster ways to learn more about your data.

Source Institute

Relevant education to the world’s tech founders

Salim Virani

Written by

If you could pick anyone in the world as your teacher, what would you learn? That’s the world we’re creating at Source Institute.

