Ronald ÁngelinMiro EngineeringWriting data product pipelines with AirflowA framework to make your data organization transition from simple DAGs to trustworthy data products.Jan 26, 20235Jan 26, 20235
Ronald ÁngelinMiro Engineering5 Strategies for data workflows scheduling at MiroSome strategies we follow to make our scheduling capability escalableMar 14, 2022Mar 14, 2022
Ronald ÁngelinMiro EngineeringAgile Data Engineering at MiroThis is how we do it: at Miro we leverage Agile concepts to drive data engineering projects. We get improved predictability, and a smoother…Nov 22, 2021Nov 22, 2021
Ronald ÁngelinTowards Data ScienceIntroducing a new pySpark’s library: owl-data-sanitizerA library to democratize data quality within companies with pySpark data pipelines.May 5, 2020May 5, 2020
Ronald Ángelininganalytics.com/inganalyticsHow to review ETL pySpark pipelinesA guide about how to perform pySpark code reviews to comply with python standards, guarantee data quality and keep your code extensible.Mar 23, 2020Mar 23, 2020
Ronald ÁngelinTowards Data ScienceWrite Clean and SOLID Scala Spark Jobsnowadays extensive pipelines are written as simple SQL queries, neglecting important development concepts as writing clean and testeable…Dec 30, 20196Dec 30, 20196
Ronald ÁngelinTowards Data ScienceUnderstanding the Spark insertInto functionProblems found while using the spark insertInto with HiveOct 22, 20192Oct 22, 20192
Ronald ÁngelinThe StartupIngesting Raw Data with Kafka Connect and Spark DatasetsIn this blog post I will explain how we use kafka-connect and spark orchestrated by platforms like kubernetes and airflow to create a Raw…Oct 15, 2019Oct 15, 2019
Ronald ÁngelDockerizing a Hand Written Digits Predictor ServiceDockerizing a Flask service that uses keras to classify Hand Written Digits.Mar 21, 2019Mar 21, 2019
Ronald ÁngelinTowards Data ScienceDataset deduplication using spark’s MLlibSpark Machine Learning: 2 approaches to deduplicate Dataframes using sparkMLib and Scala.Mar 17, 20191Mar 17, 20191