PinnedDinesh Kumar A SApache Spark Optimisation — Part 2I got a clean Spark code but the source data is challenging in distribution, quality, volume, throughput. What are the techniques to…Nov 23, 20222Nov 23, 20222
PinnedDinesh Kumar A SApache Spark Optimisation-Part 1Writing an efficient Spark program is mandatory for achieving better performance and not to spend unncessarily on the cost of infraNov 15, 2022Nov 15, 2022
Dinesh Kumar A SMy Experience with Data Stores — Part 1In my 10+ years of experience in Data Engineering and Analytics, I have worked with various data stores, ranging from basic relational…Jul 8Jul 8
Dinesh Kumar A SConsumer Lag Monitoring for Spark Streaming Application + KafkaWhen you build a consumer to read messages from a message queue such as Apache Kafka and the consumer consumes and processes slower than…Mar 20, 2023Mar 20, 2023
Dinesh Kumar A SIntroduction to Structured Streaming in Apache Spark (PySpark) + KafkaA Streaming pipeline is the need of the hour since the industry is moving towards near real time analytics and streaming applications…Dec 23, 20221Dec 23, 20221
Dinesh Kumar A S5 Tips for Writing a better SQLSQL is the integral part of the data engineering world. Be it frameworks like PySpark, ETL tools like Informatica and Datastage, SQL is the…Dec 7, 2022Dec 7, 2022
Dinesh Kumar A SWhat is Distributed Computing?“What is distributed computing ; Why and where is it needed”. Let us discuss!!Nov 25, 2022Nov 25, 2022
Dinesh Kumar A SData Warehousing with SnowflakeSnowflake is gaining popularity and the adoption among the organisations is increasing day by day. Why is it popular? Let us discuss!!Nov 18, 2022Nov 18, 2022