What competition platforms tell us about the Data Science Community

Computer scientists exist to find solutions to problems. It’s common knowledge that fluency in as many programming languages as possible is a desirable trait: like global travellers or anthropologists, who pick up different dialects in order to bond with various peoples and cultures, developers’ and data scientists’ knowledge of multiple languages is vital when acclimatising to new problems and environments.

Adjusting to different structures, working with new people and learning more stuff is a core part of life as a data scientist. It’s a very good thing, then, that organisations like Kaggle exist. Kaggle is a competitions platform which sets up challenges for engineers and developers to solve in the best, neatest way possible. And Kaggle, which was founded in 2010, is only one part of the wider data challenge ecosystem, which has experiencing a boom in growth over the last few years.

As a global database of fascinating missions and problems, Kaggle offers the chance for people to refine their roster of techniques and skills. For example, Signal’s Head of Research, Miguel, recently spent some time working on a Kaggle competition in order to refresh his abilities using the Python programming language. (Read about his experiences in more depth here.) As well as representing an interesting challenge for him, the competition he took part in also functioned as a use case for personal development and revision. While Miguel has become a Clojure specialist in recent years, without Kaggle he would not have been able to flex his Python muscles in such a stimulating and productive environment.

So what makes Kaggle, and other competition sites like it, such important parts of a data scientist’s or developer’s intellectual and professional development? First, I would argue that the data science community thrives on the principle of open-source, collective knowledge. Forums and events are more popular than ever: the ‘Data Science London’ group on get-together specialist Meetup, for instance, boasts nearly 4,500 members, and has hosted various events based around topics like healthcare, social networks and the ominous-sounding Apache Drill. Other offshoots of the broader data science group include more niche pursuits such as text analytics and neuro-linguistic programming, illustrating how almost anyone can explore their own particular interests and find something that fits them.

Interestingly, the London Meetup page lists some of the industry’s biggest names as sponsors: Hadoop, O’Reilly, Amazon Web Services and (yes) Kaggle all contribute to community events in some way, whether it’s buying pizzas or donating textbooks. Of course, there’s something in it for these firms — they are putting themselves in front of some of London’s best and brightest data scientists — but the principle of mutual inquisitiveness still defines this kind of event.

Competitions are seen as the most interesting, high-profile way for the data science community to externalise its collective knowledge. Some of the challenges posted on Kaggle (or XPrize, Challenge.gov, or any other competition centre) are intimidating in their scope and complexity: one recent project listed on Challenge.gov — which was posted by NASA — offered a $20,000 prize for the person or team who best provided “creative or practical ideas to find a dual purpose for balance mass that is jettisoned from Mars landers”. Not something to be taken lightly! But these kind of missions attract a great deal of attention, and raise the profile of what would normally be niche, esoteric experiments. The prizes on offer are attractive, too. XPrize currently lists the Wendy Schmidt Ocean Health challenge, which tasks people to create pH technology which effectively measures acidity levels in the world’s oceans. The reward? A cool $2m.

Of course, these prizes are the exception to the rule. They perform a valuable task in attracting the best minds to important problems, but they may only be of interest to the most advanced, renowned scientists and innovators (who can often be well-funded in the first place themselves). Heartwarmingly, there is just as much interest within the competition space in the challenges which offer no financial reward whatsoever. Kaggle lists the reward for these endeavours as ‘Knowledge’, and I can’t think of a better example of the spirit of the data science community than that. Work for work’s sake; learning for learning’s sake; inquisitiveness as an end in itself. That’s the message that data science chooses to broadcast about itself, and it should act as an endorsement of the community’s values and ethos.

So Miguel might be dreaming of million-dollar prizes, but for now he’ll be using platforms like Kaggle to improve his skills and sharpen his programming — a noble aim indeed.

If you’ve ever taken part in a data science competition, we want to hear what your experience was like! Let us know at info@signal.uk.com


Originally published at signal.uk.com on March 31, 2015.