This is the second chapter under the series “Structured Streaming” which center around covering all the essential details to set up a Structured Streaming query. Peruse the previous chapter here for getting introduced to Structured Streaming.
As streaming frameworks are emerging gradually, it encourages the developers to concentrate on business challenges rather than focussing on potential streaming analytics issues. Structured Streaming is a part of the Apache Spark venture, which…
Joins are one of the fundamental operation when developing a spark job. So, it is worth knowing about the optimizations before working with joins.In Data Kare Solutions we often found ourselves in situations to joining two big tables (data frames) when dealing with Spark SQL. In this…
This article centers around covering how to utilize compaction effectively to counter the small file problem in HDFS.
HDFS is not suitable to work with small files. In HDFS a file is considered smaller, if it is…
This article is a continuation of my previous article, which you can peruse here. Like the previous one this article also walks you through all the three sorts of Incremental Ingestion which…
Apache Hive has evolved as one of the most popular interactive and analytical data store in the Hadoop ecosystem, due to this demand, Hive will play a major role in designing a robust…
This article focuses on explaining how to integrate Spark’s new stream processing engine Structured Streaming with Apache Kafka brokers 0.10 and higher along with all necessary configuration details.