Getting started with Spark

Spark is similar to MapReduce’s low-latency interactive computing framework developed by UC Berkeley AMPLab. Spark is a fast general-purpose engine for processing massive amounts of data. Hadoop was developed in 2003, grew up in Yahoo, entered Apache incubation, and gained extensive use in 2008. However, there have always been problems such as fewer MR algorithms, disk read and write every time Reduce, MR needs to appear in pairs, slow…