Dennis de WeerdtinDataPebblesPortable Pipelines with Apache BeamThere are lots of use cases for data processing and analytics pipelines, and nearly as many frameworks to use. Apache Spark is probably…Dec 26, 2020Dec 26, 2020
Dennis de WeerdtinDataPebblesKafka to Spark Structured Streaming, with Exactly-Once SemanticsApache Spark Structured Streaming is a part of the Spark Dataset API. This is an improvement from the DStream-based Spark Streaming, which…Nov 2, 2020Nov 2, 2020
Dennis de WeerdtinDataPebblesData Quality Dashboards: Is Your Data Doing Ok?Everyone loves data dashboards, right? Fancy visualisations which provide key insights into otherwise opaque data? Of course you do.Sep 28, 2020Sep 28, 2020
Dennis de WeerdtinDataPebblesPartitioning and Bucketing in Hive: Which and when?Lately, I've been getting my feet wet with Apache Hive. Two of the more interesting features I've come across so far have been…Sep 16, 2020Sep 16, 2020
Dennis de WeerdtinDataPebblesMoving Spark into KubernetesIn my previous post, I discussed how to write a simple Spark application in Kotlin, and run it with Airflow. This time around, let's see…Aug 26, 2020Aug 26, 2020
Dennis de WeerdtinDataPebblesSpark and Airflow with KotlinRecently, I was thinking about something new I could learn, and I ended up with two options. The first was to try working with Apache…Aug 6, 2020Aug 6, 2020