Vindhya GExploring Spark Thrift JDBC/ODBC Server: Purpose and OverviewExploring Data Engineering can feel overwhelming with its array of technologies and concepts. Today, let’s focus on one key aspect within…Mar 23Mar 23
Vindhya GApache Flink Vs Apache Spark: Design Distinctions and their ImplicationsAs I started learning about Flink after becoming quite skilled with Spark, a key question bothered me: What sets Flink apart from Spark…Aug 13, 2023Aug 13, 2023
Vindhya GApache Flink 1.17.1: Stream and Process Kafka Events using Table APIAs promised in the earlier article, I attempted the same use case of reading events from Kafka in JSON format, performing data grouping…Jul 20, 2023Jul 20, 2023
Vindhya GApache Flink 1.17.0: Streaming JSON Events from Kafka -Complete Sample CodeWhen I initially delved into Flink, I faced a challenge in comprehending the process of running a basic streaming job. My goal was to read…Jul 19, 20231Jul 19, 20231
Vindhya GOptimizing Shuffle Operations in Apache Spark Structured Streaming: Key ConsiderationsOne of the often asked questions in Spark is why high memory-to-data size ratio is observed. It is not uncommon for a batch size of 1GB to…Jun 25, 2023Jun 25, 2023
Vindhya GStateful processing in Spark Structured Streaming — Troubleshooting Java OOM heap space errorIn earlier days of working with spark structured streaming be it an application with a flatmapgroupwithstate or an application with just an…Jan 6, 2023Jan 6, 2023
Vindhya GHow aggregation works end to end in Spark Structured StreamingWhile using Spark i learnt a lot of concepts w.r.t to distributed processing starting right from Map-Reduce . While there are so many great…Nov 5, 20221Nov 5, 20221
Vindhya GDataset.IsEmpty vs rdd.isEmpty() in Apache Spark 2.x.xHaving an efficient spark application with huge dataset + multiple joins and aggregations is always tricky. Specially if you have window…Jun 12, 2022Jun 12, 2022