Spark running process of advanced big data

Sajjad Hussain
Oct 23 · 3 min read

Among the many technical frameworks of big data, Spark has been widely recognized since its development. Hadoop and Spark can be said to be the mainstream choice of enterprise-level data platforms, based on different application scenarios, to build a big data system platform that meets the needs. Today we will talk about Spark, the core running process of Spark.

Spark computing mode

Spark is another generation of computing framework proposed after Hadoop. It also focuses on offline batch processing, but on the basis of Hadoop’s native computing engine MapReduce, it has achieved a performance increase of 10–100 times, thus surpassing the native MapReduce in the Hadoop ecosystem. , And gradually get reused.

Image for post
Image for post

Spark inherits the characteristics of Hadoop MapReduce and is a typical Master/worker architecture. This architecture divides the computing tasks and then allocates them to multiple slaves, that is, Map. After the slaves have completed the tasks assigned to them, they are then summarized on the Master, that is, Reduce. This is the idea of ​​MapReduce. .

Spark running process

Spark creates a Spark context on the Master. The purpose of creating a SparkContext is to prepare the running environment of the Spark application. In Spark, SparkContext is responsible for communicating with ClusterManager, applying for resources, assigning tasks, and monitoring.

Driver means driving, that is, after the entire system is started, the operation of the entire system is driven by the Driver, and the user’s own jobs are also decomposed and scheduled to run through the Driver.

After resource application, Spark usually asks the resource manager to start its own Worker in the Container, which is the Executor process. At the same time, the start command will carry the URL address of the Driver to facilitate the registration of itself with the Driver after the Executor starts.

Image for post
Image for post

After the Executor registers itself with the Driver, everyone knows each other and can communicate with each other, interact according to the protocol, and the entire distributed system is running.

Driver and Executor are directly connected to each other through the RPC protocol. Spark has used two RPC implementations internally in the history, Akka Actor-based RPC and Netty-based RPC.

Executor is the specific executor. After the Executor gets its own Task, it runs the result, and then reports the result to the Driver.

Driver and Executors both run their own Java processes, which can be on the same machine or on different machines.

Spark resource management

As for the resource manager, there are many options. It can be the resource manager implemented by Spark itself, standalone mode, or some more general resource managers, such as Yarn and Mesos, which is why it is said that Spark can run independently by itself, or it can be integrated and coordinated with Hadoop.

Image for post
Image for post

Regarding the Spark operating process of advanced big data, I believe that after reading today’s sharing content, everyone will have a clearer understanding. Spark is a core technology framework that must be mastered in big data. It requires a firm grasp of the operating principle and architecture design and proficient use.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store