List: Azure | Curated by António Rosa

Feb 13, 2024
38 stories
Azure
Elshad Karimov
in
Dev Genius
Wanna Code Like a FAANG Engineer? Together, let’s dive into advanced Python!Learn about magical libraries, effective looping, syntax, and more. You absolutely must read this if you want to improve your talents!
Jan 4
18
Jan 4
18
Vishal Barvaliya
Databricks Certified Data Engineer Professional Certification || Resources/ Tips/…Important Resources and concepts to prepare
Jan 15
Jan 15
Nivethanvenkat
How to access JVM in Databricks using Spark for writing data with customized file name in object…This is specific for Databricks users, as when we are trying to write the data coming from API or different data sources the files will be…
Jan 26
Jan 26
 This story is no longer available
Vitor Teixeira
in
Towards Data Science
Delta Lake — Partitioning, Z-Order and Liquid ClusteringHow are different partitioning/clustering methods implemented in Delta? How do they work in practice?
Nov 8, 2023
5
Nov 8, 2023
5
Ashwin
Apache Spark Memory ManagementAre you struggling with managing memory in your Apache Spark applications? Look no further. This article will provide you with valuable…
Jan 24
Jan 24
Ashwin
Partition Skew of Apache SparkWelcome to the world of Apache Spark’s partition skew! Are you frustrated with slow and inefficient data processing on your Spark cluster…
Jan 22
Jan 22
Vishal Barvaliya
Important Spark Topics for Data EngineerImportant Spark operations and Transformations for Data Engineers.
Jan 14
3
Jan 14
3
Arthur Caron
Optimizing Pyspark code for Delta formatOptimising Python code for handling data in Delta format, especially when working with large datasets, requires a blend of efficient coding…
Jan 5
2
Jan 5
2
Shantanu Tripathi
Troubleshooting Slow Spark Job: 5 Key Areas to InvestigateSpark is supposed to reduce ETL time by leveraging the concept of efficient parallelism. If your job isn’t doing so, let’s discuss 5…
Jan 5
Jan 5
Azam Khan
Apache Spark — Get source files created timestamp as a column in DataframeMany a times we need to get the file created timestamp of the source files as a column in spark DF.
Jan 1
Jan 1
Pralabh Saxena
in
Level Up Coding
Handling Skewed Data in PySpark: Strategies for Balanced ProcessingImportance of handling the skewed data
Sep 24, 2023
Sep 24, 2023
Kerrache Massipssa
Apache Spark Join Strategies in Depth
Dec 24, 2023
Dec 24, 2023
Subham Khandelwal
in
Dev Genius
PySpark — Optimize Joins in SparkShuffle Hash Join, Sort Merge Join, Broadcast joins and Bucketing for better Join Performance.
Dec 30, 2023
1
Dec 30, 2023
1
Ansab Iqbal
Delta Lake Introduction with Examples [ using Pyspark ]What is Delta lake
Aug 20, 2023
3
Aug 20, 2023
3
Cinto
in
The Startup
How to Debug Queries by Just Using Spark UIYou already have the thing you need to debug a query
Aug 23, 2020
1
Aug 23, 2020
1
Michael Berk
in
Towards Data Science
1.5 Years of Spark Knowledge in 8 TipsMy learnings from Databricks customer engagements
Dec 24, 2023
12
Dec 24, 2023
12
SIRIGIRI HARI KRISHNA
in
Towards Dev
Auto LoaderAutoloader simplifies reading various data file types from popular cloud locations like Amazon S3, Azure Data Lake, Google Cloud Storage…
Dec 10, 2023
1
Dec 10, 2023
1
Tom Corbin
Dates and Timestamps in PySparkTips & Tricks
Oct 14, 2023
Oct 14, 2023
Think Data
This level of detail in Spark is tackled only by expertsIn PySpark, query optimization involves two main approaches: rule-based optimization and cost-based optimization. These strategies are…
Nov 12, 2023
2
Nov 12, 2023
2