Image for post
Image for post

In my previous posts:

Practical tips to get the best out of Data Build Tool (dbt) — Part 1

Practical tips to get the best out of Data Build Tool (dbt) — Part 2

I covered different topics that you might want to discuss, with your data engineering team, at the beginning of your journey with dbt.

In this article, I will discuss at a high level:

  • How to orchestrate data pipelines and dbt models.
  • Use dbt documentation and serve it to an entire organisation.

How to orchestrate data pipelines and dbt models

As you probably know dbt works with models DAGs (Directed acyclic graphs). The question is…


Image for post
Image for post

In my previous post:

Practical tips to get the best out of Data Build Tool (dbt) — Part 1
https://medium.com/photobox-technology-product-and-design/practical-tips-to-get-the-best-out-of-data-building-tool-dbt-part-1-8cfa21ef97c5

I explained different approaches to split functionalities in dbt projects while building a data platform and how to best organize dbt models.

In this article, I will discuss the following topics:

  • get the best out of your dbt_project.yml file.
  • dbt makes you rethink some aspects of traditional data warehousing.
  • dbt Macros usage.
  • wrap dbt.

Get the best out of your dbt_project.yml file

Dbt_project.yml contains your dbt project configuration. It can easily become an unreadable monster or be almost empty depending on how you are using dbt.

For each…


Image for post
Image for post

This series of articles is for those who already have a basic experience with dbt (Data Build Tool) and want to get the best out of it when building a data platform. If you want to read an introduction regarding how to build data pipelines using DBT, I suggest having a look at this article that I wrote a few months ago or the official DBT documentation.

dbt is undoubtedly great for performing ELT but, sometimes, is presented as a tool mostly designed for analysts. …


Image for post
Image for post

When we talk about data, the number of technologies available on the market is overwhelming and staying up to date is a key challenge both for businesses and for engineers.

One of the reasons why I recently joined Photobox was to be in a data-driven company with the challenge of building a new data platform using some of the most cutting edge technologies available — AWS (Amazon Web Services), Snowflake and Looker.

I spent the last four years of my career mainly working on GCP (Google Cloud Platform), leading the development of The Telegraph data platform. …


Image for post
Image for post

The Telegraph is a 164-old-company where data has always had a central role. With the advent of the cloud and the need to build a platform able to process a huge quantity of data in 2015, we started to build our big data platform. We decided to use Google Cloud and, since delivering our first PoC, over the years we have kept improving our platform to better support the business.

The challenge

During the last 4 years, I had multiple discussions on how to handle data transformation or, more extensively, (ETL) Extract Transform and Load processes. The number of tools that you…


Image for post
Image for post

Industry Outlook

Technology enables publishers to measure the impact that a piece of content is having as soon as it becomes public. These days, reacting to this data is a vital part of promoting quality journalism in the sea of online articles competing for our attention. The real-time understanding of how a story is performing can significantly help to improve the customer experience on both our website and mobile apps. It’s important to know what our registrants and subscribers want to read and how we can deliver articles that are relevant to our audience.

The Challenge

Under this premise, in 2017 the data team…


Image for post
Image for post

You might go online and find headlines like “Alphastar goes 10:1 against human pros”, “Alphastar mastering real-time strategy game Starcraft II” or “Human StarCraft II e-athletes crushed by neural net”.

I personally feel all these headlines misleading. I consider myself a Starcraft fan and I played at an average level for multiple years I’m also a data engineer with a certain exposure to data science and machine learning so I feel I can understand general concepts of both worlds.

Let’s start with Starcraft. Starcraft II is a real-time strategy game published and developed by Blizzard Entertainment. The title has both…


Image for post
Image for post

At The Telegraph, the Data Team is a cross-functional group of engineers that manages the core big data platform. It has the responsibility to orchestrate hundreds of batch and real-time data pipelines in order to ensure that the process of data ingestion and transformation is performed on time and is compliant with the highest data quality standards.

This allows analysts to provide reports and insights and the business to make decisions based on reliable figures. …

Stefano Solimito

Principal data engineer @ Photobox

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store