Alexander LopatinDAG generation in Apache AirflowHow to Create DAGs with Hundreds of Tasks and Not Go Crazy?5d ago5d ago
Alexander LopatinRow number in Spark is simple, but there are nuances…Spark is very powerful for Big Data processing and its power requires developer to write code carefully. I needed to get unique number ID…Mar 16Mar 16
Alexander LopatinYet Another SCD type 2 with Spark/ScalaRecently, I was faced with the task of building marts that allow business users to see how long exact value was active and in the same…Mar 71Mar 71
Alexander LopatinUnlocking Spark’s Potential with Scala: split Data Frame by left join.In this post, I’m going to write function that use Scala’s function approach.Dec 24, 2023Dec 24, 2023
Alexander LopatinSpeeding up reading from JDBC through SparkReading data from JDBC sources by Spark can be really challenging sometimes. When your data is stored in Hadoop cluster and your Spark…Sep 2, 20231Sep 2, 20231
Alexander LopatinFill in the missing dates with the previous valuesThere are such Data Sets that requires to fill them with additional rows. In my case, I had to fill in the missing dates with previous…Dec 9, 2022Dec 9, 2022
Alexander LopatinTransform structured and arrayed data into flat Data FrameBig Data and Data Engineering are about working with complex data, not just flat table. Many difference types of sources allow us to store…Aug 7, 2022Aug 7, 2022
Alexander LopatinUsing a TailRec for reading a dynamic list of JDBC tables in SparkSometimes, it may happen that you don’t know exactly how many sources for your DataFrame you will use.Jul 23, 2022Jul 23, 2022