Sai ParvathaneniSpark Optimization Techniques: Repartition() and Coalesce()Understanding Data Skewness in Apache Spark7h ago
João PedroinTowards Data ScienceMy First Billion (of Rows) in DuckDBFirst Impressions of DuckDB handling 450Gb in a real projectMay 110
Feruz UrazalievMastering PySpark: Advanced Techniques for Data EngineeringDive into advanced techniques for mastering PySpark, a powerful tool for data engineering. This guide explores sophisticated methods to…Jul 25Jul 25
Ganesh AnkulwarEffortlessly Flatten JSON Strings in PySpark Without Predefined Schema: Using Production ExperienceIn the ever-evolving world of big data, dealing with complex and nested JSON structures is a common challenge for data engineers. JSON…3d ago3d ago
Sujit J FulseOptimise an Already Optimised Heavy Spark Job with Long Lineage.Upon receiving the initial requirement to write a Spark job , you inquired about the volume of data that the job would be processing. The…Jan 273Jan 273
Sai ParvathaneniSpark Optimization Techniques: Repartition() and Coalesce()Understanding Data Skewness in Apache Spark7h ago
João PedroinTowards Data ScienceMy First Billion (of Rows) in DuckDBFirst Impressions of DuckDB handling 450Gb in a real projectMay 110
Feruz UrazalievMastering PySpark: Advanced Techniques for Data EngineeringDive into advanced techniques for mastering PySpark, a powerful tool for data engineering. This guide explores sophisticated methods to…Jul 25
Ganesh AnkulwarEffortlessly Flatten JSON Strings in PySpark Without Predefined Schema: Using Production ExperienceIn the ever-evolving world of big data, dealing with complex and nested JSON structures is a common challenge for data engineers. JSON…3d ago
Sujit J FulseOptimise an Already Optimised Heavy Spark Job with Long Lineage.Upon receiving the initial requirement to write a Spark job , you inquired about the volume of data that the job would be processing. The…Jan 273
Amit JoshiSpark Architecture: A Deep DiveApache Spark is an open-source distributed computing system designed for big data processing and analytics. Spark is known for its speed…Jun 1, 20231
Petrica LeucainDev GeniusDuckDB, what’s the quack about?In the autumn of 2022, DuckDB entered the cool kids group on the modern data stage[1]. In this article I deep dive into what DuckDB is and…Jan 20, 2023