Thomas Cardenas – Medium

Thomas Cardenas

Thomas Cardenas
in
Data Engineer Things

Best Practices for Writing Maintainable and Testable Spark Code in Scala

Enhancing Scalability and Reliability Through Structured Spark Development Practices

Apr 17

Best Practices for Writing Maintainable and Testable Spark Code in Scala

Apr 17

Thomas Cardenas
in
Data Engineer Things

Managing Late-Arriving Data for Accurate Reporting

Data Engineering Excellence: Best Practices for Managing Late-Arriving Data in Metrics Pipelines

Mar 26

Managing Late-Arriving Data for Accurate Reporting

Mar 26

Thomas Cardenas
in
Data Engineer Things

The Art of Efficient Data Lake Organization

Guidelines for Streamlined Data Lake Organization

Oct 24, 2023

The Art of Efficient Data Lake Organization

Oct 24, 2023

Thomas Cardenas
in
Ancestry Product & Technology

Harnessing Intervals in Apache Airflow for Efficient and Reliable Data Processing

Introduction

Oct 17, 2023

Harnessing Intervals in Apache Airflow for Efficient and Reliable Data Processing

Oct 17, 2023

Thomas Cardenas

Calculating Daily/Monthly Active Users with Spark & Iceberg

When ever I hear about metrics I really want to dive into understanding them and coming up with a sample pipeline to demonstrate it. One of…

Oct 17, 2023

Calculating Daily/Monthly Active Users with Spark & Iceberg

Oct 17, 2023

Thomas Cardenas

Simplifying Complex Data Merging: Combining Data Sources into a Single Table

In the world of data engineering, merging data from different sources into a single table is a common practice. In this article, we will…

Oct 9, 2023

Simplifying Complex Data Merging: Combining Data Sources into a Single Table

Oct 9, 2023

Thomas Cardenas

Streaming data to S3 from SNS using Firehose

Photo by Tosab Photography on Unsplash

Oct 2, 2023

Streaming data to S3 from SNS using Firehose

Oct 2, 2023

Thomas Cardenas

How to Reduce Full Table Scans during Merges in Apache Iceberg and Save Money

Sep 28, 2023

How to Reduce Full Table Scans during Merges in Apache Iceberg and Save Money

Sep 28, 2023

Thomas Cardenas
in
Ancestry Product & Technology

Solving the Small File Problem in Iceberg Tables

The Data Platform team at Ancestry has been maintaining a fully-refreshed 100-billion-row Apache Iceberg table for several months. A…

Aug 29, 2023

Solving the Small File Problem in Iceberg Tables

Aug 29, 2023

Thomas Cardenas
in
Ancestry Product & Technology

Scaling Ancestry.com: How to Optimize Updates for Iceberg Tables with 100 Billion Rows

One of the most interesting datasets at Ancestry is the Hints database. This is used to alert users that potential new information is…

Feb 23, 2023

Scaling Ancestry.com: How to Optimize Updates for Iceberg Tables with 100 Billion Rows

Feb 23, 2023

Thomas Cardenas

Thomas Cardenas

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams