Data Analysis use With Spark2X on MRS Service

Published in

Huawei Developers

5 min readJul 27, 2023

Introduction

Hello everyone in this article we will demo an exercise about data analysis on MRS. You know we can create a cluster with one click and use a lot of open-source tools like Spark2x, Kafka, Flink, Flume, Hive, Hadoop, and Zookeeper. We will try to analyze raw data is vehicle owners’ driving behavior information, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigued driving. With the powerful analysis capability of the Spark2x component, we can analyze and collect statistics on the number of drivers’ violations in a specified period, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigue driving. Prepare your coffee and enjoy readings. ☕

What Is MRS?

MapReduce Service (MRS) serves on Huawei Cloud for you to manage Hadoop-based components. With MRS, you can deploy a Hadoop cluster in a few clicks. Tenants can take full control of clusters and easily run big data components such as Storm, Hadoop, Spark, HBase, and Kafka. MRS is fully compatible with open-source APIs and combines the advantages of Huawei Cloud computing and storage and big data industry experience to provide customers with a full-stack big data platform with high performance, low cost, flexibility, and ease of use. In addition, the platform can be customized according to service requirements to help organizations quickly build a big data processing system and discover new value points and business opportunities by analyzing and examining massive amounts of real-time or non-real-time data. 💻

Step-by-step Creating MRS Cluster and Analysis Data

1- Creating a bucket on OBS Parallel File Systems for Storage Data:

Step1 Firstly we need a parallel file system bucket for the storage our data. We can buy a bucket shown in the picture below.

Step2 After then Input and Program folder in the buckets.

Step3 We have to download “ https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/driver_behavior.jar “ file for the upload in the Program folder.

Step4 We have to download “ https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/detail-records.zip “ file for the upload in Program folder. Its name is detail-records including Spark sample data from.

Note: I had difficulty in loading data to the parallel file system via console. Therefore, I have successfully uploaded the data to my buckets with the “obsutil” solution in Ubuntu to load data. You can look to my teammate Feyza’s article. She tells very detailed Obsutil solutions.

For the article click to Link.

That’s all we need to do on the Bucket side.

2- Buy an MRS Cluster

Step1 Now we have to buy an MRS Cluster.

Click on MRS Service on the service tree then click to Buy MRS cluster.

Step2 Then we set the configuration parameters like in the picture below on quick Configuration section.

Region: AP-Singapore

Billing Mode: Pay-per-use

Cluster Name: mrs_Spark2x-demo

Version Type: Normal

Cluster Version: MRS 3.1.0

Cluster Type: Analysis Cluster (for offline data analysis)

AZ: AZ2

Enterprise Project: Default

Step3 Continue to set parameters.

VPC: Select VPC for your MRS node’s network.

Subnet: Select the subnet your VPC.

Master Node Count: 2

Username: Admin

Password: Select Password for login to root.

Confirm your password

Step4 Creating Job for the Cluster

we have to create a new job in to the creating cluster. Click on the cluster name and find Jobs section and click to this button. (Clusters > Active Clusters. On the displayed page, click the mrs_Spark2x-demo cluster).

Step5 Set Job parameters like showing below.

Type: SparkSubmit

Name: driver_behavior_task

Program Path: Click OBS and select the driver_behavior.jar package uploaded in Program Folder.

Program Parameter: Select — class in Parameter, and enter com.huawei.bigdata.spark.examples.DriverBehavior in Value.

Parameters: Enter AK SK 1 Input path Output path.

Note: Output path should be a directory that does not exist, for example, obs://obs-demo-analysis-hwt4/output/

Service Parameter: Default

3- Viewing the Job Execution Results

Step1 Go to the Jobs page to view the job execution status.

Step2 Wait 1 to 2 minutes and log in to OBS console. Go to the output path of the creating bucket file system to view the execution result. Click Download in the Operation column of the generated CSV file to download the file to your local PC.

Conclusion

In this article, we aimed to analyze vehicle owners' data. We used MRS platform on Huawei Cloud. We understand the work principles of MRS service. Finally, we showed the analysis results. MRS is a very effective service for Big Data solutions.

References:

Function Overview_MapReduce Service_Huawei Cloud

To solve these issues, HUAWEI CLOUD provides MapReduce Service (MRS). MRS allows you to quickly build and operate…

support.huaweicloud.com

How to upload a file to bucket with bash codes using Obsutil ❓

Hi everyone 👋😊 , in this article I will show you upload a file to bucket with bash codes using Obsutil. obsutil is a…

medium.com