Spark + AI Summit 2019 Europe Recap

Itai Yaffe
Nielsen-TLV-Tech-Blog
4 min readNov 8, 2019

By: Adi Polak and Itai Yaffe

For proper disclosure: the writers of this post (Adi Polak from Microsoft and Itai Yaffe from Nielsen) both gave talks at the summit. You can find the video recording from Itai’s session here, and we’ll publish the video recording from Adi’s session once it’s available.

Wow!!! We are still trying to recover from all the good stuff that took place in Amsterdam three weeks ago.

What did we have?

  • Content
  • Community
  • Fun

General

The conference had 8 tracks:

In total, there were over 10 sessions in parallel most of the time (since some tracks had more than 1 session at the same time). This variety made the decision of which session to attend pretty challenging, since many times we found ourselves debating over 2, 3, or even 4 sessions, in an effort to choose only one.

The organizers welcomed more than 2300 attendees from 63 countries across the world.

Taken during the first day of the summit

Let’s dive deeper:

Content

Most, if not all, of the video recordings are available at Databricks’ YouTube channel, and the keynotes are available here.

Challenges with Apache Spark that stood out from the talks were:

  • Shuffle is expensive, how can we tweak it to work for us? A couple of sessions offered smart bucketing, broadcast, parameters’ tuning and much more. If you would like to know more, watch Rose Toomey’s “Apache Spark at Scale in the Cloud” and Daniel Tomes“Apache Spark Core — Practical Optimization”.
  • We need CI/CD for Machine Learning — CI/CD term is well integrated into developers’ work. But machine learning is not there yet. So how can we close the gap? Many companies offer their solution to help manage ML pipelines to bring it closer to CI/CD. Read here to learn more.
  • Unified Analytics Platform for Data Engineering and Data Science — Databricks, closing the gap between the two parties and giving a PaaS solution for Spark on the Cloud. The platform is developed by Apache Spark founders and available on Public Cloud (e.g Azure).

A glimpse of two great sessions:

  • Data Science Challenges:
Democratizing Machine Learning: Perspective from a scikit-learn Creator, by Gaël Varoquaux
  • How Microsoft created a generic and scalable Anomaly Detection service over Apache Spark:
CyberMLToolkit: Scalable Anomaly Detection Generic Service over Apache Spark (Azure Sentinel), by Roy Levin

Community

There is a strong community of Developers and Data scientists who use Apache Spark on a daily basis. It grows every day.
The ecosystem grows as well and Databricks introduced many more products over the years.
Amongst the products that were announced this year, are:
1. Delta Lake, an open-source storage layer that brings ACID
transactions to Apache Spark and big data workloads.
During the summit, it was reported that Delta Lake is used by over 3,700 organizations and that the Delta Lake project is joining the Linux Foundation.
2. Koalas, which provides Pandas DataFrame API on top of Apache Spark.

Women in Unified Analytics Speakers

Diversity and Inclusion was a strong theme at the summit, with several events branded “Women in Unified Analytics” taking place, including a Lunch+Panel in cooperation with Women in Big Data.
Another event was a meetup that took place the night before the summit and was free of charge, aiming to help bring more people into Big Data communities.

Adi Polak on ML Pipelines with Apache Spark ecosystem
Itai Yaffe talking about the Women in Big Data program

Fun

People were sharing and having an open conversation about how they use Apache Spark in their day-to-day, and about experiments they are conducting to figure out what’s the right way to use it on the cloud with various Public Cloud services. And yes — we also had a Boat Ride to dinner & party at a Boat House 🎉.

The conference was well organized, great food, funny t-shirts :

Is your love affair with Data missing some Spark?

Games:

Solve your data problems at the speed of light!

DELICIOUS coffee.

And Friends.

Akamai meets Nielsen meets Microsoft
Me & Holden Karau’s Boo
Holden Karau and Itai Yaffe

A more thorough and informative review of the summit will be published soon!

In addition, you can read Databricks’ official recap of the summit here.

And last but not least — you are more than welcome to follow us on Twitter, we’ll be happy to connect and take questions (Adi, Itai).

Thanks for reading!

--

--

Itai Yaffe
Nielsen-TLV-Tech-Blog

Sr. Architect @Akamai (prev. @databricks). Public speaker. @DataWomen Israeli chapter co-founder, @bigthingshere co-organizer. https://twitter.com/ItaiYaffe