Top stories published by Data Engineering in October of 2023

Homepage

Open in app

Data Engineering

Top Stories published by Data Engineering in October of 2023

All

2023

October

Anup Moncy in Data Engineering

Oct 24, 2023

Get best performance for PySpark jobs using Parallelize

I have seen sometimes even more that 25x speed when operations are using parallize. This does depend on the other workloads on the cluster. Still the difference is significant

Anup Moncy in Data Engineering

Oct 8, 2023

Rethinking Surrogate Keys: Efficient Alternatives for Modern Data Models

1 response

Anup Moncy in Data Engineering

Oct 18, 2023

Troubleshoot Spark/Pyspark performance issues

Steps to help troubleshoot common performance issues in Spark/Pyspark jobs taking EMR/Databricks as example. Of-coarse all these after reviewing there is no change in the data trend or volume.

TL/DR

Anup Moncy in Data Engineering

Oct 29, 2023

Modern SQL — Not so commonly used, yet powerful features

Most data platforms have evolved and so has the way we write SQLs. Although the basic fundamentals remain the same. I thought it would be useful to write about not so talked about SQL features.

July 2023

About

Data Engineering

Short publication intended to discuss whats what in Data Engineering

More information