Unifying your data engineering and data science teams

Nimish Rao
DataSeries
Published in
3 min readApr 15, 2020
Photo by Alex Rosario on Unsplash

How do you ensure very driven and capable teams can work together and not step on each other’s toes? How do you ensure that you can utilise the output of the data engineering team, and share it with the data science team, and vice versa?

Azure Databricks (also available in the AWS cloud): Databricks is a unified analytics platform bringing together an agile, customer centric data engineering and a data science team.

I have had good success helping key data engineering and data science teams collaborate and deliver success to their customers.

Databricks lets you collaborate using their notebook format.

Some scenarios of how you can use the notebooks to drive data engineering and machine learning workloads:

a) Natively connect to data storage/streaming data in Azure or other cloud services. You can deliver big data engineering pipelines using Databricks.

b) You can also implement algorithms using Databricks notebooks i.e. implementing XGboost or a simple linear regression or implement deep learning.

c) The notebook also lets you collaborate and create business modules for example customer churn, fraud detection, sentiment analysis, text analytics etc.

The Databricks Ecosystem

Databricks also lends itself to the complete data and machine learning lifecycle.

Databricks and the Data and ML lifecycle

Conclusion

Databricks is a great tool. I like the the following features in particular.

a) The ability to run code across a Spark cluster and run big data engineering workloads and also intensive data science processes

b) Of course the topic of this blog! The ability to ensure the efforts of the data engineers and data scientists can be combined and they can utilise each other’s hard work!

c) The ability to write code in different languages

d) The integration with Git

e) The support for the Data and ML Ops life cycle

f) Integration with Active Directory to ensure security rules can flow seamlessly to the notebook

Have you used Databricks? I would love to know your experience.

Note: The opinions expressed herein are mine alone and do not represent the opinions of my employer.

--

--