Databricks: Bridging the Gap Between Tech and Business

A consultant’s perspective on how learning Databricks can enable data-driven decisions in the workplace

Manchester D&A
Slalom Data & AI
5 min readApr 13, 2023

--

Photo by Ivan Samkov from Pexels

By Sakib Moghal

I’m personally a big fan of Databricks for a number of reasons. Before I get into them, I think it’s worth sharing a little about my background.

I’ve worked in data and analytics for about five years. I don’t have a STEM degree, I didn’t build computers when I was five, and I don’t love to code in my spare time. On the contrary, I was someone who had an irrational fear of computers — someone who was intimidated by the matrix-looking black screen of the command line.

A semi-accurate representation of my initial fear of code (Photo by Elisa Ventur on Unsplash)

Primarily, I’m interested in people and society — who we are, why we do what we do, and how we work together. By extension, this interest led me to be curious about business — because a business is just a collection of people working towards a collective mission. And by further extension, this interest led me to be curious about data because, when starting out in my career, I couldn’t help but notice how prevalent this stuff was — and is — in business. A 2021 UK government study found that “Almost half of businesses (48%) are recruiting for roles that require hard data skills but under half (46%) have struggled to recruit for these roles over the last 2 years.” Demand for these skills is growing, and the supply is struggling to keep up.

Ultimately, data is a tool, used by people in businesses to solve whatever problems and achieve whatever goals that business has. This curiosity led me to dip my toe into the world of data and, five years later, I’m still here.

I say all this to demonstrate that — as someone who doesn’t eat code for breakfast — I initially found the world of data and analytics rather complicated. And I know I’m not alone. It’s a complex space, and the pace of change is mind-boggling. For those fortunate enough to love systems, statistics, and software, it simply means a comfortable salary for a difficult but enjoyable job. For others, who recognise the need for these skills but to whom the ability doesn’t quite come naturally, it means taking on a challenging battle to navigate a complicated labyrinth of technologies, trends, and buzzwords.

The reality is, building big data platforms that can handle the demands of modern-day business analytics is complicated. But it’s also necessary for businesses to grow. According to one study, the global data and analytics market is projected to grow to about $350 billion by 2030. So there’s an increasing need to bridge the gap between both business and technology. It’s a need I’ve seen only grow throughout my career.

So technologies that can help bridge that gap, by simplifying/abstracting away the technical complexity of data-intensive applications — well, they’re great.

And that’s why I like Databricks.

The many advantages of Databricks:

  • It’s a user-friendly, big data platform. I remember in my early career, trying to get my head around the complexity of the Apache Hadoop ecosystem. Databricks still uses many of these open-source technologies under the hood (e.g., Spark, Delta Lake), but hides a lot of the complexity and makes interacting with them pretty easy.
  • It makes many data pipeline and infrastructure best practices easy to code. Batch and streaming, distinct incremental loads, idempotency, schema evolutions, logging, elastic VM compute … much of this work takes place under the hood or can be configured in a few lines of code.
  • It allows people from different coding backgrounds to come to the tool. You can switch between Python, Scala, and SQL almost effortlessly. Write a CREATE TABLE statement, and then query it using PySpark in the very next command window.
  • It still retains flexibility and cutting-edge relevance, given its commitment to open source. Delta Lake, Git, etc. This means you’re not tied in to many company’s proprietary, clunky, awkward, UI-based tools. I’ve seen many companies get stuck trying to meet their changing user and business demands with outdated proprietary tech they’ve bought, purely because they have the licenses and the sunk-cost fallacy is preventing them from letting go.
  • It’s cloud-agnostic. AWS and Azure continue to have the best docs for integration with Databricks, but ultimately the Delta Lake is an open source tool that can run on top of any cloud object store.
  • It’s well tailored to data analysts, engineers, and scientists. Each role has their own tailored web application within the Databricks platform: analysts get an easy SQL interface and can spin up quick visual dashboards; engineers and scientists get to code in lovely notebooks, and can install whatever Python/Scala libraries are needed with ease. This ability makes it easy for each user to work in the manner they are accustomed to, all from the same source of truth from the Databricks Lakehouse.

Databricks is a powerful data engineering and analytics platform that is becoming increasingly popular in the data science community. It ultimately removes the complexity from the end-to-end data engineering lifecycle by helping clients store, clean, and visualise vast amounts of data from disparate sources.

Learning Databricks can be incredibly beneficial for individuals and businesses looking to improve their data analysis capabilities, streamline data workflows, and gain insights from large data sets. With its ease of use, scalability, and support for multiple programming languages, Databricks is an excellent platform for anyone looking to develop their skills in data engineering and analytics.

In summary, Databricks makes my present role as a data engineer much simpler. At Slalom, we have a dedicated Databricks Centre of Excellence where we help our clients design and implement Databricks solutions to help solve their data problems. If you’re a data engineer looking to take the Databricks associate data engineering exam, I’ve written a short, separate article on the topic.

Slalom is a global consulting firm that helps people and organizations dream bigger, move faster, and build better tomorrows for all. Learn more and reach out today.

--

--

Manchester D&A
Slalom Data & AI

Insights and fresh perspectives on knowledge and the latest trends in Data and Analytics from the Slalom Manchester D&A team