AMA with Anthony Goldbloom, CEO of Kaggle, the open data science platform

Mikel Bober-Irizar
Imploding Gradients
12 min readAug 8, 2017

--

Kaggle is the global home of machine learning competitions, open datasets and data science collaboration. Having hosted many high-profile competitions and recently crossing a million users while being acquired by Google, Kaggle has cemented its place as a household name in the data science community.

In the KaggleNoobs slack channel, we recently had the pleasure of hosting an AMA (Ask Me Anything) session with Anthony Goldbloom, the founder and CEO of Kaggle. Over 30 people had their questions answered during the event.

First of all, a big thank you to Jorge Arellano and yifan xie for making this AMA possible.

Host: Firstly I want to thank you for agreeing to do this, and of course thank you for founding Kaggle. Can you tell us a little about yourself, where you’re from, what you studied in school and why you think Kaggle is important for the future of Data Science?

Anthony: I am from Melbourne, Australia. I studied econometrics (basically statistics on economic data) at Melbourne University. My first job out of college was working at the Australian Treasury, forecasting GDP, inflation and unemployment. I love playing with data but my biggest frustration was that traditional economic data is small and noisy and so it’s hard to draw interesting findings from.

Starting Kaggle was really because I wanted to get access to more interesting datasets and problems. Of course it’s a bit ironic because I don’t get the chance to participate. (Although that might be a good thing…. I realize that people on Kaggle are very smart and I probably wouldn’t do that well.)

Our aim is to make Kaggle a vibrant ecosystem of code, data and discussion. If you do data science/machine learning somewhere else, you start with a blinking cursor and an empty room. At Kaggle, we want you to be able to access great code/analysis that you can fork, data that you can analyze and join to and discussion that you can learn from. We started with competitions, we now have Kaggle Kernels and the public data platform. Our near term focus is to make Kaggle Kernels more flexible and industrial strength (so you can use it for heavier compute, choosing hardware, installing packages etc) and massively increasing the number of datasets on the public data platform. Over time, we want you to be able to use Kaggle for work as well as for learning, credentialing and fun.

Our aim is to make Kaggle a vibrant ecosystem of code, data and discussion.

Q: 7 years have gone by since you created Kaggle (thanks!!!). Compared to your original vision, what has come true? What hasn’t? What brings you most satisfaction, what is your biggest surprise? What is your biggest regret?

Anthony: To be honest, I’m not sure there was a big vision 7 years ago. It was more a case that Kaggle was something I wanted to exist in the world. I’d say our vision and ambitions have grown as Kaggle has gained traction. Each time we achieve a new level of success, we shoot for the next thing. The biggest satisfaction is hearing about situations where Kaggle has given our users opportunities that they wouldn’t have had. I also enjoy the fact that Kaggle has become a well known data science/machine learning brand.

The biggest regret is that we didn’t launch Kaggle Kernels and the public data platform sooner — I’m very excited about those areas of Kaggle. We have so much more we can do with them and if we’d started working on them earlier they’d be more advanced products than they are at the moment.

Q: Kaggle inspired me to quit my job in 2015 and start my own data science consulting business. One of the biggest challenges I face is finding large clients with high quality predictive modeling projects. Any advice on how to do it?

Anthony: That’s a hard one. My view is that companies are still figuring out how to use data science/machine learning/predictive modeling more comprehensively in their businesses, so lots of their use cases are relatively unsophisticated. To find more advanced clients, you could look at the nature of companies posting on Kaggle’s jobs board for example (if they know Kaggle, they’re probably more advanced and if they’re hiring, they have an unmet need and consulting is another approach to meeting that need). That said, if you want to do more advanced work, consulting may not be the right fit at the moment. You might be better off getting a job at a more advanced company that’s already aware of the advantages of data science/machine learning/predictive modeling.

Q: Why did Google buy Kaggle? How does it help them to own Kaggle?

Anthony: There’s a big battle between the three big cloud players at the moment (AWS, Azure and Google Cloud). One of the differentiators for Google Cloud is to be the best cloud for machine learning: offering TPUs, Tensorflow as a service via Google CloudML Engine etc. Kaggle is the world’s largest machine learning and data science community, so owning Kaggle allows Google Cloud the ability to make these tools available to our community, to get feedback on them as they are launched and to drive adoption. From Kaggle’s perspective, it also works well. It allows us to offer our community far more powerful compute/services (likely surfaced through Kaggle Kernels) than we could as a small standalone company.

Q: Had there been moments during Kaggle’s ‘growing up’ where you had to pivot your vision and business model significantly? Could you share your experience on this? And what kind of support you appreciate most during such process?

Anthony: Early on, Kaggle was more a fun project than anything with a grand vision. As we became more successful, we became more ambitious. Early on we made all our revenue from machine learning competitions, but that was(n’t) very profitable: machine learning was very immature and so there wasn’t much market for machine learning competitions. In 2013, we looked at adding on other business lines that might be more profitable: such as forming expertise in specific industries and building machine learning solutions for that industry. We picked Oil & Gas as our first industry because we had Shell as a customer who wanted to do more with us and we thought the market opportunity was good. When the oil price crashed in late 2014, that industry became more challenging. However the machine learning market was starting to mature, so we could go back to building a strong business around machine learning competitions. We also launched a jobs board, which has been a nice source of revenue for us. Going forward, we’d like to offer other services, including allowing companies to use Kaggle Kernels within their data science teams.

We’ve had some supportive investors who have seen the twists and turns of many businesses who gave us helpful perspective as we made decisions on how to evolve the business.

Q: What are the must know skills if you want to succeed in Kaggle? ie Blending, stacking etc. Also, as a student I generally don’t participate in challenges that have huge datasets the main problem remains of the hardware. Would you be open to providing free hours on Google Cloud?

Anthony: I suggest reading the winner’s interviews on the blog. You’ll learn from far smarter people than me! We hope to be able to make more compute available to our community now that we’re part of Google, particularly for the bigger competitions. Still a work in progress.

We hope to be able to make more compute available to our community now that we’re part of Google.

Q: What would you suggest to me to begin to develop my skills in deep learning?

Anthony: I think the fast.ai course is excellent.

Q: I noticed an increase in number of image classification competitions, is it a new trend in Kaggle? Any chance that the old “Private Masters” style of competition will come back?

Anthony: We don’t decide what competitions we run — It depends on what our customers bring us. We have been growing the competition team recently (many of you know Walter Reade joined!) so we’re hoping to be able to run more competitions. The competition team also set a goal for Q3 of having a better spread of competitions, so we’re trying.

Q: Is it an issue for your clients (companies sponsoring competitions) that almost all solutions involve ensembles of models? Do you plan on launching competitions where the final submisson will have to be a based on single model (no ensemble)?

Anthony: In the guidelines to winners we ask that they share details of a simple model that performs ~90–95% as well but is much simpler. In practice, these simple models often perform closer to 99% as well as the ensembled model and are much more useful to customers. One possibility is to host kernel-only competitions in the future where we impose computational constraints (that effectively limit the ability to create crazy ensembles).

Q: Favorite part about being a founder?

Anthony: Kaggle started as a blinking cursor and the Vim text editor ~eight years ago. I’m really proud of what we have built: it’s very rewarding to have something that many smart people choose to spend some of their day focused on.

Q: If you ranked yourself in Kaggle’s leaderboard, where do you think you would be ranked?

Anthony: I think I could achieve Expert. I’d have a difficult time getting to Master — unless I was good at choosing team mates 😉 — and no chance of being a Grandmaster. I used to think I was a good statistician and a good programmer: after many years watching the Kaggle community, I don’t think that anymore.

Q: How large is Kaggle’s data scientist team right now?

Anthony: Our data science team is only three people (Wendy, Will and Walter). They work with customers to launch competitions. We’re pretty tiny (24 people) at the moment and don’t really have an office (most of our team works remotely).

Q: When did you first come across Data Science and how did you know this was the way for you?

Anthony: My first job out of college was forecasting GDP, inflation and unemployment. I loved playing with data! Every dataset has its secrets and I think it’s thrilling to try and uncover those secrets.

Every dataset has its secrets and I think it’s thrilling to try and uncover those secrets.

Q: Are there any competitions you wish Kaggle had hosted but did not?

Anthony: In my spare time, I do kitefoil racing. I would love to have a better wind forecasting model and have always wanted to do a wind forecasting competition.

Q: What type of AI-related tech would you like to see in the next 5–10 years? (cars do not count)

Anthony: My wife and I are having our first child in November. I joked to my wife last night that a self driving stroller would be nice 😉. I’m really excited about advances in speech recognition. We have a Google Home at our house and I think it’s amazing! I look forward to the day were I never have to look at a phone again (and can do everything via speech).

Q: What is the future of Data Science (ML/DL) 10 years from now?

Anthony: I like the (hackneyed) William Gibson quote: the future is already here, it’s just not widely distributed. Companies like Google have shown what is possible with apps like Google Home, Google Photos, Word Lens. We’re going to see more and more awesome applications of machine learning embedded in our products over the next decade. Hopefully we’ll also start to see some of the academic techniques (reinforcement learning, generative models) become useful in real world applications.

Q: What 3 tips would you give to entrepreneurs in the DS world?

Anthony: It’s only two tips, but they’re the two must important. IMO, you’re most likely to be successful if you go after a) a problem you have experienced yourself, believe others are also experiencing and no one has solved and b) something you’re passionate about (passion is important to sustain you through the difficult times).

You’re most likely to be successful if you go after a problem you have experienced yourself, believe others are also experiencing, and no one has solved.

Q: Any plans to offer an internal-only version of Kaggle for companies that are too concerned about using the public site? I think this could be a great growth opportunity for you guys if you haven’t already considered it!

Anthony: Yeah, for sure. We’re planning on launching internal competitions and internal-only kernels where companies can connect their own datasets. It’s not a priority at the moment (too many other things to do) but I’d like to see us get there by the end of 2018.

Q: Personally, I believe Kaggle Kernels is probably one of the greatest inventions so far in the field of DS. What was the motivation behind building it?

Anthony: So many discussion threads on Kaggle involved people linking to code that few would ever run (e.g. this thread— it has a few upvotes but no responses). It made us realize that running somebody else’s code is a real pain! Most people come to Kaggle to learn, so we launched Kaggle Kernels to make Kaggle a richer learning experience and also to give kernel authors the opportunity to show off their clever ideas. (I know when I have a clever idea I want to show it off 😉)

Q: Can you enumerate the 2 most difficult / troublesome moments at Kaggle since inception?

Anthony: I mentioned that Kaggle started focusing on specific verticals and use cases to boost our business in 2013. We started with a focus on O&G but when the oil price crashed in 2014 we lost a lot of revenue and we had to cut staff to survive. I’d also say one of my biggest business lessons is that it’s important to hire low ego people as well as smart people. When Kaggle started we just aimed to hire smart people and that was a mistake. It’s very difficult to enjoy work and be productive when you have to tiptoe around big egos.

I’d say one of my biggest business lessons is that it’s important to hire low ego people as well as smart people.

Q: How many times had you pitched Kaggle to different investors before actually getting your first investment? Is there a story of grit or did they get it right away?

Anthony: Raising money is hard. Venture investors typically make ~two investments per year and see hundreds (maybe even thousands) of pitches per year. Our first round was easier: we pitched ~30 firms and had 4 firms interested. Our second financing round was much harder (after the oil price crashed), we pitched ~60 firms and only had one consortium of investors interested.

Q: How did you meet Ben Hamner? (Kaggle co-founder and CTO)

Anthony: Initially at the ICDM 2010 conference in Sydney. Ben Hamner was there because he had done the ICDM machine learning challenge. I was there trying to get promote Kaggle. Then Ben started competing on Kaggle and became a very elite competitor. I met him again in 2012 when he was in the bay area interviewing at Google. He ended up joining Kaggle rather than Google (the irony is quite funny).

Q: What are the key factors one should consider while setting up a data science startup?

Anthony: At this point, it’s a pretty crowded space. Make sure you have a genuine point of difference. Kaggle was very lucky because we started a bit before the attention, so when the hype arrived, we had some traction. Now that there’s so much attention on data science and machine learning, lots of startups are piling in, making it harder to stand out.

Q: If you could be a super hero, who would it be?

Anthony: Not sure if he counts, but Andrew “Ender” Wiggin from Ender’s Game.

Host: Alright guys, lets give a round of applause 👏 to Anthony Goldbloom for doing this very informative and insightful AMA. I can tell you are extremely passionate about your work just by seeing your responses to our questions. We truly appreciate what you have done for the field of DS, and me personally as well. Keep up the great work, we hope to talk to you again some time in the future. Good luck with Kaggle and we wish you the best of luck!

Anthony: Thank you. And thanks all for making Kaggle what it is! Many more exciting things to come on Kaggle, which we hope you enjoy and find valuable.

Thanks again to Anthony for agreeing to do this AMA and giving such in-depth answers. We will be hosting many more AMAs over at the KaggleNoobs Slack in the coming future, so make sure to join if you want to participate in interviews with top Kagglers and members of the Kaggle team.

For more machine learning news, high-profile AMAs and analysis, follow Imploding Gradients on Medium! Make sure to drop a 💙 too if you found this article interesting!

--

--