David VrbainTowards Data ScienceBuilding a Data Lake on PB scale with Apache SparkHow we deal with Big Data at Emplifi·15 min read·Jan 26, 2023--6--6
David VrbainTowards Data ScienceAnalyzing Stack Overflow Dataset with Apache Spark 3.0·13 min read·Dec 13, 2021----
David VrbainTowards Data ScienceNested Data Types in Spark 3.1Working with structs in Spark SQL·6 min read·Jul 30, 2021--4--4
David VrbainTowards Data ScienceHigher-Order Functions with Spark 3.1Processing Arrays in Spark SQL.·9 min read·Jul 26, 2021--1--1
David VrbainTowards Data ScienceSpark SQL 102 — Aggregations and Window FunctionsAnalytical functions in Spark for beginners.·7 min read·Jun 30, 2021--1--1
David VrbainTowards Data ScienceAbout Sort in Spark 3.xDeep dive into data sorting in Spark SQL.·9 min read·Jun 27, 2021--2--2
David VrbainTowards Data ScienceBest Practices for Bucketing in Spark SQLThe ultimate guide to bucketing in Spark.·21 min read·Apr 25, 2021--7--7
David VrbainTowards Data SciencePerformance in Apache Spark: benchmark 9 different techniquesComparison of different approaches for array processing in Spark 3.1·12 min read·Mar 9, 2021--3--3
David VrbainTowards Data ScienceA Decent Guide to DataFrames in Spark 3.0 for BeginnersUnderstand the transformations in a conceptual way.·12 min read·Jan 25, 2021----