Today’s cluster computing arena spark is getting used for its fast and scalable application model. while comparing spark with traditional map-reduce, it provides In-memory computing which is 10x faster and provides real-time data processing with Spark streams.
Spark provides a distributed collection object which is immutable and called Resilient distributed data. RDDs are one of the core components of Spark and it is split into multiple partitions and processed in multiple nodes of the cluster.
Spark contains 4 main integrated components as below.
Data Engineering enthusiast | Big Data | Python | SQL