Tagged in

Big Data

DataKare Solutions

DataKare Solutions is a data analytics company focused on solutions powered by big data technologies helping organizations to apply advanced analytics and research.

More information

Followers

101

Elsewhere

More, on Medium

Big Data

Arun Jijo in DataKare Solutions

Dec 26, 2019

Spark SQL — Salient functions in a Nutshell

Structured Streaming

Introduction

As streaming frameworks are emerging gradually, it encourages the developers to concentrate on business challenges rather than focussing on potential streaming analytics issues. Structured Streaming is a part of the Apache Spark venture, which…

Prabhath Vemula in DataKare Solutions

Feb 25, 2019

Optimize Spark SQL Joins

Joins are one of the fundamental operation when developing a spark job. So, it is worth knowing about the optimizations before working with joins.
In Data Kare Solutions we often found ourselves in situations to joining two big tables (data frames) when dealing with Spark SQL. In this…

2 responses

Prabhath Vemula in DataKare Solutions

Feb 21, 2019

Compaction in Hive

This article centers around covering how to utilize compaction effectively to counter the small file problem in HDFS.

Small File Problem

HDFS is not suitable to work with small files. In HDFS a file is considered smaller, if it is…

1 response

Prabhath Vemula in DataKare Solutions

Feb 19, 2019

Hive Design Patterns

Incremental Ingestion -Acid Enabled Tables

This article is a continuation of my previous article, which you can peruse here. Like the previous one this article also walks you through all the three sorts of Incremental Ingestion which…

1 response

Prabhath Vemula in DataKare Solutions

Feb 16, 2019

Hive Design Patterns

Incremental Ingestion

Apache Hive

Apache Hive has evolved as one of the most popular interactive and analytical data store in the Hadoop ecosystem, due to this demand, Hive will play a major role in designing a robust…

Arun Jijo in DataKare Solutions

Feb 10, 2019

Structured Streaming: Kafka integration

This article focuses on explaining how to integrate Spark’s new stream processing engine Structured Streaming with Apache Kafka brokers 0.10 and higher along with all necessary configuration details.

Apache Kafka

Spark SQL — Salient functions in a Nutshell

As, Spark DataFrame becomes de-facto standard for data…

Key factors to consider when optimizing Spark Jobs

Structured Streaming: Essentials

Structured Streaming

Introduction

Optimize Spark SQL Joins

Compaction in Hive

Small File Problem

Hive Design Patterns

Incremental Ingestion -Acid Enabled Tables

Hive Design Patterns

Incremental Ingestion

Apache Hive

Structured Streaming: Kafka integration

Apache Kafka