Data Load Tool (DLT): Pros, Cons, and Integration into Data Platform as an Ingest Tool

Alexander Shcherbak
DataOps.tech
Published in
5 min readJul 8, 2024

--

In this post, I’d like to share insights on integrating DLT for one of our projects. I’m not covering the technology in depth in this post but just sharing my experience of integrating it, so apologies for that and here’s a link to the DLT documentation for more details :)

One of our clients, a young company focused on marketing analytics, presented us with a challenge. They needed to analyze the profitability of their marketing campaigns in real-time to make decisions on canceling or continuing them. Initially, we knew that we would work with Facebook Ads and Stripe sources primarily.

Main site page

Discovering the Right Tool

We didn’t start to write own code because in 2024 year there are a lot ingest tools which have a lot of connectors so we started to looking for the most appropriate one. Our journey began with evaluating several popular data ingestion tools. Airbyte was our first candidate due to its extensive range of connectors and I think the most popular. However, as we delved deeper into its documentation, we realized that Airbyte might complicate the initial Proof of Concept (POC) phase. Our client’s need for rapid results and budget constraints ruled out the paid version, leaving us with the open-source option, which required setup effort, but client point was quick result for short period of time.

Determined to find a better solution, we considered alternatives like Fivetran and Meltano, but the Data Load Tool (DLT) caught our eye. DLT promised already predefined solution for working with Facebook Ads and Stripe and flexibility so we decided to give it a try.

The Magic of DLT

The first time we deployed DLT for Facebook Ads was almost magical. With a single command

dlt init facebook_ads athena

We set up a pipeline that generated a folder filled with Python code. This code was ready to fetch data from Facebook Ads and push it into our destination, S3 + Athena. Unlike traditional connectors, DLT provided us with actual code, which meant we could optimize and extend it as needed. Also it generates config file with params like for type of tables(iceberg, parquet) and file with connection params for your source and destination. But we decided to not manage it in our code so we skip it here.

The structure of folder after running command above
The folder structure in your code repo wfter running command above

In many situations, everything seems to be working perfectly until you hit a snag in production. With DLT, we had a source code pipeline that covered 99% of the nuances of working with the source, and we could make real-time optimizations. This was a game-changer for us, as it allowed us to address issues swiftly without waiting for open-source contributions to be merged. Also when we decided to get additional data we extended basic source with new resource, resource as result is a table in your destination. So simple :)

Resource:

  @dlt.resource(primary_key="id", write_disposition="merge")
def accounts(
fields: Sequence[str] = DEFAULT_ACCOUNT_FIELDS
) -> Iterator[TDataItems]:
yield dict(account.api_get(fields))

Source returns resources:

@dlt.source(name="facebook_insights")
def facebook_insights_source(
account_id: str = None,
access_token: str = dlt.secrets.value,
...
) -> DltResource:
...

The Integration Journey

After executing the initial setup command, we received Python code which define sources after that we need to call them to start the data fetching process. For doing that you can create a pipeline instance, here a doc . The final step was to run orchestration process for these tasks to ensure data was ingested at regular intervals. I’d like to say that we used Airflow helper from DLT team to run the pipeline smoothly on Airflow. Here a link for doc page , it says how deploy on Google Composer but also there is a source code how to use helper.

Given our client’s primary use of AWS, we decided to deploy the orchestrator using AWS Managed Workflows for Apache Airflow (MWAA). This decision was influenced by the need for an easily manageable solution that integrated well with their existing infrastructure. MWAA provided the simplicity and integration we needed, allowing us to meet our client’s requirements efficiently.

Evaluating DLT: The Pros and the Cons

As we continued working with DLT, we discovered several significant advantages. The flexibility and maintainability of having actual Python code were unparalleled. We could quickly fix bugs, extend logic, and optimize the code to meet our specific needs. DLT felt like a low-code solution that provided a solid template to build upon. Also in developing phase our sources have been extended so we need to write custom resources for custom REST APIs and it was easy just using templates to build ingest for specific use cases like playing LEGO.

“DLT tool like LEGO” generated by DALL-E

However, DLT wasn’t without its drawbacks. Its relative lack of popularity meant fewer available connectors compared to Airbyte. At the time of writing, DLT supported only 33 sources and 15 destinations, significantly fewer than Airbyte’s 350+ connectors. This limitation required us to sometimes find workarounds or custom solutions for less common data sources but as I pointed this lib gives you write custom logic for any source really quick just using common bulding blocks like resource, source, state, etc.

Conclusion: A Worthy Addition

In the end, integrating DLT into our client’s data platform proved to be a rewarding experience. Despite some limitations, the advantages of flexibility, maintainability, and cost-efficiency made it a valuable addition to our toolkit. For startups and companies with specific needs, DLT offers a compelling solution that balances ease of use with powerful customization options.

If you’re looking for a data ingestion tool that can adapt to your unique requirements and provide a seamless integration experience, we highly recommend giving DLT a try.

--

--