Wrap-Up: Adobe Spark Meetup

Jaemi Bremner
Sep 28, 2018 · 3 min read

Authors: Sandeep Nayak, Andrew Chen, and Yogesh Natarajan

Why did we have a Spark meetup?

What pops in your mind when you hear “Adobe”? You think Photoshop and marketing applications, right? You might be surprised that we have been working on a scalable, multi-cloud, extensible and open platform for developers and data scientists.

We want to share with our peers in the developer community and be open about what we are developing as a platform and with machine learning. We decided to do a meetup, partnering with Databricks on a technology we feel passionately about: Spark. We want to share how we are using Spark in our platform and the use cases we are solving as computer scientists.

The following is a wrap-up and the content we shared in a meetup we did had on Sept. 19 at the Adobe San Jose campus. We partnered with Databricks on how we used Spark for Adobe Experience Platform.

The problem

At Adobe, we are lucky to hear from companies who are at the forefront of solving for the “Experience Era” problems you might be hearing in the news.

What we heard as computer scientists are specific use cases from these companies. We heard from our customers:

  • Companies are still struggling trying to collect data from their different properties (web, social, marketing apps), marketing tech (marketing automation) and traditional enterprise systems (CRM and ERP) to deliver great experiences.
  • Data engineers and data scientists spend too much of their work day on aggregating, cleansing, normalizing and standardizing data. They want to spend more of their time and capacity on asking the right questions and getting insights that drive results that matter.

It’s a persistent and big problem.

How we are solving the problem

During the meet up, we shared how Adobe Experience Platform harmonizes data across these sources.

Adobe Experience Platform Spark Architecture

As part of the Adobe Experience Platform, we have also built a query engine leveraging Spark SQL for ad-hoc data querying. The query engine has implemented a PostgreSQL protocol and leverages Akka Streams and the Presto Parser as an abstraction layer around Spark SQL. We have also patched Spark SQL with support for nested column pruning that is critical to our performance needs when accessing data with thousands of nested fields. We even have a common data dictionary in an open source called XDM.

Adobe Experience Platform Query Service

We have been actively contributing so Spark. While we don’t have any actual contributions upstream yet, this is what we plan to extend: https://github.com/apache/spark/pull/21320.

If you missed our Spark Meetup, you can still check out our presentation here.

What‘s next?

If what we are sharing is exciting to you, join us. Adobe has developer and data scientist opportunities.

If you want to join us for future meetups, sign up here.

Follow the Adobe Tech Blog for more developer stories and resources, and check out Adobe Developers on Twitter for the latest news and developer products.

Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Jaemi Bremner

Written by

Experience Technologist. Developer Advocate for Adobe Experience Platform. Passionate about technology, architecture, fashion, and design. Twitter: @jaeness

Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade