Best Data Engineering books
Data Engineering is a rapidly growing field and with new technologies, frameworks and libraries coming at an alarming rate specially in the form of cloud offerings from AWS, Azure and GCP. It is becoming essential more than ever to have a clear roadmap for learning things in Data engineering.
Video tutorials are a decent way to learn new stuff but I have always found videos to be more restrictive than good old books. so in this post I will share a few gems for learning the field of data engineering, These are in no particular order and :
1. Fundamentals of Data Engineering
by Joe Reis, Matt Housley
Perfect introduction for a high level overview of the field.
https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/
2. Learning Spark, 2nd Edition
by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee
Great introduction to Apache spark, the de facto standard in the data engineering industry.
https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/
3. Spark: The Definitive Guide
by Bill Chambers, Matei Zaharia
Another great book for learning Apache spark.
https://www.oreilly.com/library/view/spark-the-definitive/9781491912201/
4. Kafka: The Definitive Guide
by Neha Narkhede, Gwen Shapira, Todd Palino
Great for learning about Kafka which is hugely important in building fault tolerant applications.
https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/
5. Designing Data-Intensive Applications
Industry standard book for teaching you the practices for building the architecture at scale.
https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/