Sergey IvanychevSimple trick to debug stuck Python jobsTL;DR: in separate thread run a function that periodically records and saves stack traces of all threads and write them to external…Aug 8Aug 8
Sergey IvanychevinConstructor EngineeringObserve and record performance of Databricks jobsUsing Victoria Metrics for Databricks Spark monitoring performanceNov 28, 2023Nov 28, 2023
Sergey IvanychevinConstructor EngineeringHow to optimize AWS S3 costs via granular visualisationBuild a report on how much every folder costs, who accessed it, and whether it should be removed to optimize spending.Aug 31, 20231Aug 31, 20231
Sergey IvanychevinConstructor EngineeringFaster PySpark Unit TestsTL;DR: A PySpark unit test setup for pytest that uses efficient default settings and utilizes all CPU cores via pytest-xdist is available…Jun 24, 2022Jun 24, 2022
Sergey IvanychevinJoomBuilding data platform in PySpark. Part 1 — Python and Scala interopAccording to multiple recent surveys (as of 2021), Python is among the most widely used programming languages and its share of users is…Dec 1, 2021Dec 1, 2021