Data Analysis use With Spark2X on MRS Service

Mert Goktas
Huawei Developers
Published in
5 min readJul 27, 2023
Big Data Service MRS on Huawei Cloud

Introduction

Hello everyone in this article we will demo an exercise about data analysis on MRS. You know we can create a cluster with one click and use a lot of open-source tools like Spark2x, Kafka, Flink, Flume, Hive, Hadoop, and Zookeeper. We will try to analyze raw data is vehicle owners’ driving behavior information, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigued driving. With the powerful analysis capability of the Spark2x component, we can analyze and collect statistics on the number of drivers’ violations in a specified period, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigue driving. Prepare your coffee and enjoy readings. ☕

What Is MRS?

MapReduce Service (MRS) serves on Huawei Cloud for you to manage Hadoop-based components. With MRS, you can deploy a Hadoop cluster in a few clicks. Tenants can take full control of clusters and easily run big data components such as Storm, Hadoop, Spark, HBase, and Kafka. MRS is fully compatible with open-source APIs and combines the advantages of Huawei Cloud computing and storage and big data industry experience to provide customers with a full-stack big data platform with high performance, low cost, flexibility, and ease of use. In addition, the platform can be customized according to service requirements to help organizations quickly build a big data processing system and discover new value points and business opportunities by analyzing and examining massive amounts of real-time or non-real-time data. 💻

Step-by-step Creating MRS Cluster and Analysis Data

1- Creating a bucket on OBS Parallel File Systems for Storage Data:

  • Step1 Firstly we need a parallel file system bucket for the storage our data. We can buy a bucket shown in the picture below.
Creating bucket Step1
  • Step2 After then Input and Program folder in the buckets.
Creating bucket Step2
Creating bucket Step3
Creating bucket Step4

Note: I had difficulty in loading data to the parallel file system via console. Therefore, I have successfully uploaded the data to my buckets with the “obsutil” solution in Ubuntu to load data. You can look to my teammate Feyza’s article. She tells very detailed Obsutil solutions.

For the article click to Link.

That’s all we need to do on the Bucket side.

2- Buy an MRS Cluster

  • Step1 Now we have to buy an MRS Cluster.
Buying an MRS Cluster Step1

Click on MRS Service on the service tree then click to Buy MRS cluster.

Buying an MRS Cluster Step1
  • Step2 Then we set the configuration parameters like in the picture below on quick Configuration section.

Region: AP-Singapore

Billing Mode: Pay-per-use

Cluster Name: mrs_Spark2x-demo

Version Type: Normal

Cluster Version: MRS 3.1.0

Cluster Type: Analysis Cluster (for offline data analysis)

AZ: AZ2

Enterprise Project: Default

Buying an MRS Cluster Step2
  • Step3 Continue to set parameters.

VPC: Select VPC for your MRS node’s network.

Subnet: Select the subnet your VPC.

Master Node Count: 2

Username: Admin

Password: Select Password for login to root.

Confirm your password

Buying an MRS Cluster Step3
  • Step4 Creating Job for the Cluster

we have to create a new job in to the creating cluster. Click on the cluster name and find Jobs section and click to this button. (Clusters > Active Clusters. On the displayed page, click the mrs_Spark2x-demo cluster).

Buying an MRS Cluster Step4
  • Step5 Set Job parameters like showing below.

Type: SparkSubmit

Name: driver_behavior_task

Program Path: Click OBS and select the driver_behavior.jar package uploaded in Program Folder.

Program Parameter: Select — class in Parameter, and enter com.huawei.bigdata.spark.examples.DriverBehavior in Value.

Parameters: Enter AK SK 1 Input path Output path.

Note: Output path should be a directory that does not exist, for example, obs://obs-demo-analysis-hwt4/output/

Service Parameter: Default

Buying an MRS Cluster Step5

3- Viewing the Job Execution Results

  • Step1 Go to the Jobs page to view the job execution status.
Viewing Results Step1
  • Step2 Wait 1 to 2 minutes and log in to OBS console. Go to the output path of the creating bucket file system to view the execution result. Click Download in the Operation column of the generated CSV file to download the file to your local PC.
Viewing Results Step1

Conclusion

In this article, we aimed to analyze vehicle owners' data. We used MRS platform on Huawei Cloud. We understand the work principles of MRS service. Finally, we showed the analysis results. MRS is a very effective service for Big Data solutions.

References:

--

--