Best Data Engineering books

Ansab Iqbal
2 min readJan 19, 2023

--

Data Engineering is a rapidly growing field and with new technologies, frameworks and libraries coming at an alarming rate specially in the form of cloud offerings from AWS, Azure and GCP. It is becoming essential more than ever to have a clear roadmap for learning things in Data engineering.

Video tutorials are a decent way to learn new stuff but I have always found videos to be more restrictive than good old books. so in this post I will share a few gems for learning the field of data engineering, These are in no particular order and :

1. Fundamentals of Data Engineering

by Joe Reis, Matt Housley

Perfect introduction for a high level overview of the field.

https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/

2. Learning Spark, 2nd Edition

by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee

Great introduction to Apache spark, the de facto standard in the data engineering industry.

https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/

3. Spark: The Definitive Guide

by Bill Chambers, Matei Zaharia

Another great book for learning Apache spark.

https://www.oreilly.com/library/view/spark-the-definitive/9781491912201/

4. Kafka: The Definitive Guide

by Neha Narkhede, Gwen Shapira, Todd Palino

Great for learning about Kafka which is hugely important in building fault tolerant applications.

https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/

5. Designing Data-Intensive Applications

by Martin Kleppmann

Industry standard book for teaching you the practices for building the architecture at scale.

https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/

--

--

Ansab Iqbal

Software/Data Engineer, passionate about Data and ML solutions, Write about anything that might make a difference. All opinions my own.