Thomas CardenasinData Engineer ThingsBest Practices for Writing Maintainable and Testable Spark Code in ScalaEnhancing Scalability and Reliability Through Structured Spark Development Practices·4 min read·Apr 17, 2024----
Thomas CardenasinData Engineer ThingsManaging Late-Arriving Data for Accurate ReportingData Engineering Excellence: Best Practices for Managing Late-Arriving Data in Metrics Pipelines·4 min read·Mar 26, 2024----
Thomas CardenasinData Engineer ThingsThe Art of Efficient Data Lake OrganizationGuidelines for Streamlined Data Lake Organization·5 min read·Oct 24, 2023----
Thomas CardenasinAncestry Product & TechnologyHarnessing Intervals in Apache Airflow for Efficient and Reliable Data ProcessingIntroduction·7 min read·Oct 17, 2023----
Thomas CardenasCalculating Daily/Monthly Active Users with Spark & IcebergWhen ever I hear about metrics I really want to dive into understanding them and coming up with a sample pipeline to demonstrate it. One of…·6 min read·Oct 17, 2023----
Thomas CardenasSimplifying Complex Data Merging: Combining Data Sources into a Single TableIn the world of data engineering, merging data from different sources into a single table is a common practice. In this article, we will…·5 min read·Oct 9, 2023----
Thomas CardenasStreaming data to S3 from SNS using FirehosePhoto by Tosab Photography on Unsplash3 min read·Oct 2, 2023----
Thomas CardenasHow to Reduce Full Table Scans during Merges in Apache Iceberg and Save Money2 min read·Sep 28, 2023----
Thomas CardenasinAncestry Product & TechnologySolving the Small File Problem in Iceberg TablesThe Data Platform team at Ancestry has been maintaining a fully-refreshed 100-billion-row Apache Iceberg table for several months. A…3 min read·Aug 29, 2023--4--4
Thomas CardenasinAncestry Product & TechnologyScaling Ancestry.com: How to Optimize Updates for Iceberg Tables with 100 Billion RowsOne of the most interesting datasets at Ancestry is the Hints database. This is used to alert users that potential new information is…6 min read·Feb 23, 2023--3--3