Data Science Conferences: One List to Rule Them All

Jared Polivka
Kaizen Data
Published in
11 min readNov 18, 2015
Photo from MLconf 2015, San Francisco

Note: This post originally appeared on on Quora

Hello data scientists! In this post, I’m going to cover the top data science conferences in the United States.

For each data science conference, I’m going to provide a description, the intended audience and the history / origin story of the conference.

*Note: I will be updating this conference list over time. If I missed one of your favorite conferences, Tweet at me or comment on the post and I’ll add it to the list :)

List of Data Science Conferences:

Kaizen Data Conference
Kaizen is focused on the new frontiers of applied data science — expect practical talks and hands-on workshops from the best data scientists in industry. Topics covered include: machine learning, neural networks, deep learning, data viz, and more.

Who Should Attend:
Data scientists, data engineers, data analysts, software engineers

Backstory:
Kaizen was started by a band of disgruntled data science conference attendees who were tired of attending conferences that lacked high quality technical content. The goal of Kaizen is for attendees to improve their data science skills, exchange ideas and build relationships that extend beyond the conference.

“Kaizen” is a Sino-Japanese word that translates to “change for better.” In modern times, the term “kaizen” means “continuous improvement.” True to the name, this conference was designed from the ground up to foster improvement via applied data science.

DataEngConf
DataEngConf is the first data engineering conference that bridges the gap between data engineers and data scientists. Conference talks focus on examples of real-world architectures, data pipelines and plumbing systems, and applied, practical examples of data science algorithms and tools.

Who Should Attend:
Software engineers, data scientists using open source tools and technologies

Backstory:
Started off as the Data Engineering Meetup. Discovered that professional data engineers are typically interested in data science and Machine Learning topics as well. Some even think they might want to become data scientists! So Pete Soderling (founder of Hakka Labs) expanded the conference to two tracks and added a dedicated to data science track with a focus on applied algorithms and techniques that engineers would benefit from knowing about.

Strata + Hadoop World
Presented by O’Reilly and Cloudera, Strata + Hadoop World is where big data, cutting-edge data science, and new business fundamentals intersect and merge.

*Note, Special Discount:
O’Reilly Discount (Twitter: @OReillyMedia)
Our friends at O’Reilly have given us an amazing discount! Get 40% off print books and 50% off ebooks and videos. To redeem this discount, go to shop.oreilly.com and enter the code: PCGALV

Who Should Attend:
Strata + Hadoop World is where big data’s most influential business decision makers, strategists, architects, developers, and analysts gather to shape the future of their businesses and technologies.

According to the organizers, at Strata + Hadoop you will:

  • “Be among the first to understand how you can leverage the promise of this huge change, and survive the resulting disruption”
  • “Find new ways to leverage your data assets across industries and disciplines”
  • “Learn how to take big data from science project to real business application“
  • “Discover training, hiring, and career opportunities for data professionals“
  • “Meet-face-to face with other innovators and thought leaders“

Backstory:
Edd Dumbill announced the launch of the Strata in September 2010. The first Strata Conference was held in Santa Clara in 2011.

When talking about the vision for Strata, Dumbill states:

“We believe that the future belongs to those who understand how to collect and use their data successfully. There’s a change in both the skills of data analysts and the technology they use that’s sweeping through industry and science. Our aim with Strata is to be the defining event for that change: for practitioners, businesses and data vendors.”

PyData

PyData brings together users and developers of data analysis tools to share ideas and learn from each other. The conference addresses evolving challenges in data management, processing, analytics, and visualization.

Who should attend?
Developers and users of data analysis tools including Python, R, and Julia.

Backstory:
PyData was initially founded to provide Python data enthusiasts a place to share ideas and learn from each other. A major goal of the first conference was to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.

The first PyData Workshop was held in March of 2012, at the Googleplex in Mountain View, CA. Many prominent individuals in the Python data & scientific computing community were on-hand to deliver tutorials and how-to presentations. The workshops (and a Friday night hack-a-thon) were a great success.

PyData evolved to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

PyData has since grown into an international conference series drawing thousands of participants each year.

Data Science Summit presented by Dato

The Data Science Summit is where you can learn how to create the next generation of intelligent applications. Visionaries share their new, surprising and inspiring ideas, academics teach the frontiers of machine learning, and industry leaders share practical case studies on how their data science and development teams are building new products and services with machine learning.

Who Should Attend:
The Data Science Summit is for anyone who wants to learn how to build the next generation of intelligent applications.

Developers and data scientists come to learn the latest technical innovations and to be inspired by the ways members of the community are applying them. Business leaders come to see what is possible for their teams and how they can reinvent their companies by leveraging data science, machine learning and intelligent applications

Backstory:
Dato held their first annual conference in 2011. Back then, they were an academic project called GraphLab based out of Carnegie Mellon University. The graph analytics software was open source for a couple of years, and companies frequently invited the Dato team to give talks about it.

Since Dato didn’t have enough people to go visit all those companies, they decided to hold a workshop. Dato planned to invite a dozen companies, around 30 people in total. More than 300 people registered! They switched the workshop location to much a bigger room :-)

The first workshop convinced Dato that there was a large demand for an applied machine learning solution. The next year they held the workshop, 500 data scientists and researchers showed up. The third year, they had 700 registrations. And so the Dato team decided it was time to change our event name from a workshop to a conference since it had become one of the largest machine learning events in the world.

In 2015, Dato expanded the scope of their event to include a variety of data science elements and renamed the conference “the Data Science Summit.” They had over 1100 participants with world leaders in data science speaking, including names like Prof. Rob Tibshirani (Stanford), Prof. Alex Smola (CMU), Prof. Chris Re (Stanford), Prof. Carlos Guestrin (UW), Prof. Jeef Heer (UW), Prof. Mike Jordan (Berekely) as well as many others.

Datapalooza

Datapalooza is an immersive experience for the data science community where attendees will learn how to craft a data product in 3 days.

Who should attend?
Data science professionals including: data scientists, data engineers, app developers.

Backstory:
Datapalooza was founded on the idea of community. Like Lollapalooza, the organizers want deliver a amazing shared experience or data science professionals of learning, networking, shared success and fun where every participant will build a data product in just three days.

Rich Data Summit

The Rich Data Summit is thrown by data scientists, for data scientists. The organizers focus not only on where they see data science going in the future, but the challenges data scientists have now, primarily that they spend up to 80% of their time cleaning and labeling data instead of doing something that’s, well, a bit more enjoyable.

Who should attend?
Data scientists, first and foremost. However, anyone who’s interested in how data affects business, government, sports, etc. will get value from attending.

Backstory:
Crowd Flower is a company founded by data scientists; they strive to make sure that their platform gives data scientists time to do the work they want to do by saving them time cleaning and labeling their data.

In the words of the Crowd Flower Team:

“Everyone talks about their algorithms and the importance of “”big data”” but we don’t spend enough time working with what we have and working to make big data into something more useful — -like rich data.
Also, we like parties.”

H2o World

H2O World is a leading machine learning conference in Silicon Valley. Attendees will get to hear from rockstar data scientists like Hilary Mason and Monica Rogati and large organizations like Quora and Macy’s.

Who Should Attend:
This conference is aimed at developers and data scientists who are looking to leverage the power of machine learning to build smarter applications.

Backstory:
H2O World was created in order to provide a place where data scientists and developers could come together to discuss the latest trends and use cases for machine learning technology. This is the second year of the conference and the organizers expect 700+ attendees over three days and 70+ talks and sessions.

ML Conf

MLconf is a single day, single track event, devoted to the Machine Learning and Data Science community in major cities, agnostic of any tool, platform or company. MLconf events host speakers from various industries, research and universities to discuss recent research and application of Machine Learning methodologies and practices.

MLconf has a “no sales pitch” motto; the organizers carefully curate content to help members of the community share what’s being used now.

Who Should Attend:
I could list a bunch of titles (i.e. Data Scientists, Researchers, Software Engineers, etc.)… really, if you’re interested in the recent research and application of Machine Learning methodologies and practices, you should attend.

Backstory:
In 2013, MLconf became a separate event, devoted to the Machine Learning and Data Science community in San Francisco, agnostic of any tool, platform or company.

The goal of MLconf is to host speakers from various industries, research and universities to discuss recent research and application of Machine Learning methodologies and practices.

In 2014, MLconf entered NYC and Atlanta, as well as San Francisco. In 2015, MLconf has hosted conferences in NYC, Atlanta, Seattle and San Francisco, with plans to enter additional US cities in 2016, and the UK.

Open Data Science Summit

The Open Data Science Conference (ODSC) brings together the data science community to help foster the exchange of innovative ideas and encourage the growth of open source software.

In addition to global conferences, the ODSC team also runs meetups, workshops, code sprints and hackathons to help current and future data scientists learn, connect and collaborate.

Who Should Attend:
Decision makers (CTOs and lead data scientists) and the decision influencers (the people who actually use and build analytic tools).

Backstory:
ODSC began as a successful Meetup in Boston to help practicing data scientists network and exchange ideas with an emphasis on open source projects. The Meetup grew into an annual event, Boston Data Fest. Starting in 2015, ODSC is also a host of ODSC Boston.

Data Science Pop-up

Anna Anisin, from Domino Data Lab, started Data Science Pop-ups to unite the brightest leaders in data science who are passionate about asking the right questions and identifying problems worth solving.

The Data Science Pop-up is a day-long data science conference with talks, panels and workshops.

Who Should Attend:
Executives, data scientists, developers and business development professionals.

Backstory:
DataPopup was created to fill the void left by Gigaom’s abrupt shut down (March 9th, 2015) and the cancelling of the Structured Data Conference.

GraphConnect

GraphConnect is the only conference dedicated to the growing world of graph databases. You’ll learn why the relationships between data points matter as much as the data itself — and you’ll never see the world the same again.

Who Should Attend:
GraphConnect is for data scientists who are ready to see the world differently, who know that data is connected in more ways than a mere row-and-table database will ever compensate for.

GraphConnect is for the hacker who wants to be on the edge of everything new and do kick-ass projects with new technology before anyone else has even ever heard of it.

Finally, GraphConnect is for the business deviant who wants to leapfrog the competition, getting to customers, capital and insights faster than jaws can drop to the floor in a stuffy boardroom once they’ve discovered how you’ve outmaneuvered them.

Backstory:
GraphConnect was started with a singular mission in mind: spread the awesome news about graph databases and how they transform our view of how data should be organized. GraphConnect was held in 4 different cities across the globe (whew!) during the first year after launch. While GraphConnect is only hosted in two cities these days (London and San Francisco), there are now more attendees at just *one* GraphConnect event than at those first four cities combined.

Lucene Revolution

Bringing together the Apache Lucene/Solr community from around the world to hear about the latest trends and use cases for the most widely deployed search platform on the planet. Attendees will learn from those who built and work on the platform every day.

Who Should Attend:
Developers, Engineers, and Engineering Managers — Apache open source experts/enthusiasts along with developers and managers whose jobs are to build search and discovery applications within their organizations

Backstory:
Lucene/Solr Revolution originated in 2010 as a small developer conference, hosted by Lucidworks, to bring the distributed (worldwide) Apache Solr community together for face-to-face collaboration.

Lucene Revolution has doubled in size since its origin and has attracted the attention and support of large enterprises like Salesforce & Bloomberg who have participated as conference sponsors to showcase their use of the Solr platform as well as to hire top Solr talent. With such a distributed developer and committer community, Lucene/Solr Revolution is the place to learn, collaborate, and meet the best and brightest minds in Solr, and is the largest face-to-face gathering of Solr committers.

Extract: Data Stories Worth Sharing

One full day jam-packed with data stories that will entertain, educate and inspire you. It’s everything you’ve ever wanted to know about data, told by the people who know it best. Mix 1 part growth hacking, 2 parts data analysis, toss in a dash of mad scientist and that is what Extract is all about.

At Extract you’ll learn actionable tactics to help grow communities, products and companies. Extract’s speakers hail from some of the fastest growing and innovative companies in the business and have accomplished what they are teaching; the goal is that all attendees will leave knowing how to implement what they learn.

Who Should Attend:
Data scientists, analysts, growth marketers and decision makers from Fortune 500 companies.

Wrap Up

I hope this ode to the top data science conferences in the U.S. aids you in your quest for data science mastery and adventure.

#ShamelessPlug
Want to read more data science blog posts like the following? Check out these blog posts:

--

--