Sebastian DaberdakuinTowards Data EngineeringCentralized AWS CloudWatch log collection to S34d ago4d ago
Sebastian DaberdakuinTowards Data EngineeringExploring the Machine Learning capabilities of TrinoBuilding SVM-based classification and regression models with a pure SQL interface, how hard could it be?Jun 11Jun 11
Sebastian DaberdakuinTowards Data EngineeringTesting Apache Airflow DAGs locally with Testcontainers and LocalStackThis article presents a simple strategy for testing Airflow DAGs locally using LocalStack for mocking AWS cloud services.Jun 9Jun 9
Sebastian DaberdakuinTowards Data EngineeringBuilding a custom Apache Spark Docker image with AWS Glue Data Catalog support as metastoreAWS Glue is not supported out of the box by Spark. In this article we will see how to build the latest 3.5.1 version with Glue support.Jun 81Jun 81
Sebastian DaberdakuinTowards Data EngineeringPerforming Delta Table operations in PySpark with Spark ConnectIntroduced with Spark 3.4, Spark Connect provides a decoupled client-server architecture allowing remote connectivity to Spark clusters…Jun 7Jun 7