Why did we have a Spark meetup?
What pops in your mind when you hear “Adobe”? You think Photoshop and marketing applications, right? You might be surprised that we have been working on a scalable, multi-cloud, extensible and open platform for developers and data scientists.
We want to share with our peers in the developer community and be open about what we are developing as a platform and with machine learning. We decided to do a meetup, partnering with Databricks on a technology we feel passionately about: Spark. We want to share how we are using Spark in our platform and the use cases we are solving as computer scientists.
The following is a wrap-up and the content we shared in a meetup we did had on Sept. 19 at the Adobe San Jose campus. We partnered with Databricks on how we used Spark for Adobe Experience Platform.
At Adobe, we are lucky to hear from companies who are at the forefront of solving for the “Experience Era” problems you might be hearing in the news.
What we heard as computer scientists are specific use cases from these companies. We heard from our customers:
- Companies are still struggling trying to collect data from their different properties (web, social, marketing apps), marketing tech (marketing automation) and traditional enterprise systems (CRM and ERP) to deliver great experiences.
- Data engineers and data scientists spend too much of their work day on aggregating, cleansing, normalizing and standardizing data. They want to spend more of their time and capacity on asking the right questions and getting insights that drive results that matter.
It’s a persistent and big problem.
How we are solving the problem
During the meet up, we shared how Adobe Experience Platform harmonizes data across these sources.
As part of the Adobe Experience Platform, we have also built a query engine leveraging Spark SQL for ad-hoc data querying. The query engine has implemented a PostgreSQL protocol and leverages Akka Streams and the Presto Parser as an abstraction layer around Spark SQL. We have also patched Spark SQL with support for nested column pruning that is critical to our performance needs when accessing data with thousands of nested fields. We even have a common data dictionary in an open source called XDM.
We have been actively contributing so Spark. While we don’t have any actual contributions upstream yet, this is what we plan to extend: https://github.com/apache/spark/pull/21320.
If you missed our Spark Meetup, you can still check out our presentation here.
If what we are sharing is exciting to you, join us. Adobe has developer and data scientist opportunities.
If you want to join us for future meetups, sign up here.