Alex Fernando MonteiroIntegrating a serverless application in the extraction layer of a DAG in Apache AirflowThis article highlights the importance of cloud tooling in the Data Engineering contextJul 25, 2022Jul 25, 2022
Alex Fernando MonteiroImproving Apache Spark processing performance when reading small filesSometimes an ETL pipeline running Spark needs to ingest a lot of files in S3 at once, and there will be a bottleneck due to HDFS.May 30, 2022May 30, 2022