Sukul MahadikAirflow parse time vs runtime time Dynamism (aka Dynamic Tasks) Sunday, July 9Airflow DAGs, implemented in Python, provide an inherent dynamism that empowers us to utilize loops and conditional logic, facilitating the…Jul 9, 2023Jul 9, 2023
Sukul MahadikUsing Datasets to define dependencies in Airflow 2.4 and aboveIn addition to scheduling Directed Acyclic Graphs (DAGs) based on time, Airflow version 2.4 and beyond supports scheduling based on task…Jun 16, 2023Jun 16, 2023
Sukul MahadikApache Airflow Best Practices 2023–05–311 Focus on building resilient pipelines.Jun 3, 2023Jun 3, 2023
Sukul MahadikUnderstanding Default Parallelism (Spark Configuration parameter : spark.default.parallelism)As per Spark documentation here , following is what the configuration parameter spark.default.parallelism means :Jan 9, 2023Jan 9, 2023
Sukul MahadikList of cool blogs focussing on Spark performance optimization.Determining no of partitions in Apache Spark Part — 1Jan 9, 2023Jan 9, 2023
Sukul MahadikUnderstanding number of partitions in a RDD and types of Partitioners.References:Jan 8, 2023Jan 8, 2023
Sukul MahadikUnderstanding Python virtual environments using venv and virtualenvReferences:Aug 7, 2022Aug 7, 2022