Thomas Lawless – Medium

Thomas Lawless

Thomas Lawless

Error Handling with Apache Spark Structured Streaming

In today’s data-driven world, real-time data processing is a critical requirement for many businesses. Apache Spark Structured Streaming…

Jul 25

Error Handling with Apache Spark Structured Streaming

Jul 25

Thomas Lawless

Apache Spark Structured Streaming in PySpark with Apache Iceberg & Kafka

In modern data architectures, integrating streaming and batch processing with efficient data storage and retrieval is critical. Apache…

Jul 16

Apache Spark Structured Streaming in PySpark with Apache Iceberg & Kafka

Jul 16

Thomas Lawless

Apache Iceberg: Spark SQL vs. Spark DataFrames

Apache Iceberg is a table format designed for huge analytic datasets, providing efficient data storage and retrieval. When working with…

Jun 26

Apache Iceberg: Spark SQL vs. Spark DataFrames

Jun 26

Thomas Lawless

Apache Iceberg Table Maintenance using PySpark

Apache Iceberg has emerged as a powerful table format for managing large analytical datasets. Its features like schema evolution, time…

Jun 19

Apache Iceberg Table Maintenance using PySpark

Jun 19

Thomas Lawless

Branching & Tagging Apache Iceberg Tables

Apache Iceberg is revolutionizing the way data is managed. With its robust architecture, Iceberg supports features that were traditionally…

Jun 18

Branching & Tagging Apache Iceberg Tables

Jun 18

Thomas Lawless

Developing with Apache Iceberg & PySpark

Apache Iceberg and PySpark are powerful tools for managing and analyzing large datasets. Setting up a local development environment is…

Jun 17

Developing with Apache Iceberg & PySpark

Jun 17

Thomas Lawless

PySpark Development with Poetry & PEX

Managing dependencies for PySpark applications can be challenging, especially when you want to maintain a clean development environment.

Jun 9

PySpark Development with Poetry & PEX

Jun 9

Thomas Lawless

Partitioning Apache Iceberg Tables

Photo by Jessica Johnston on Unsplash

Jun 1

Partitioning Apache Iceberg Tables

Jun 1

Thomas Lawless

PySpark and Software Engineering Best Practices

Photo by Pavel Neznanov on Unsplash

May 19

PySpark and Software Engineering Best Practices

May 19

Thomas Lawless

Enhancing the Developer Experience for Data Scientists and Engineers

Within the practices of data science and engineering, the quest for insights and innovation hinges on the efficiency and effectiveness of…

May 14

Enhancing the Developer Experience for Data Scientists and Engineers

May 14

Thomas Lawless

Thomas Lawless

Distinguished Engineer, IBM CIO Data, AI, and Automation Platform

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams