Data Analysis use With Spark2X on MRS Service
Introduction
Hello everyone in this article we will demo an exercise about data analysis on MRS. You know we can create a cluster with one click and use a lot of open-source tools like Spark2x, Kafka, Flink, Flume, Hive, Hadoop, and Zookeeper. We will try to analyze raw data is vehicle owners’ driving behavior information, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigued driving. With the powerful analysis capability of the Spark2x component, we can analyze and collect statistics on the number of drivers’ violations in a specified period, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigue driving. Prepare your coffee and enjoy readings. ☕
What Is MRS?
MapReduce Service (MRS) serves on Huawei Cloud for you to manage Hadoop-based components. With MRS, you can deploy a Hadoop cluster in a few clicks. Tenants can take full control of clusters and easily run big data components such as Storm, Hadoop, Spark, HBase, and Kafka. MRS is fully compatible with open-source APIs and combines the advantages of Huawei Cloud computing and storage and big data industry experience to provide customers with a full-stack big data platform with high performance, low cost, flexibility, and ease of use. In addition, the platform can be customized according to service requirements to help organizations quickly build a big data processing system and discover new value points and business opportunities by analyzing and examining massive amounts of real-time or non-real-time data. 💻
Step-by-step Creating MRS Cluster and Analysis Data
1- Creating a bucket on OBS Parallel File Systems for Storage Data:
- Step1 Firstly we need a parallel file system bucket for the storage our data. We can buy a bucket shown in the picture below.
- Step2 After then Input and Program folder in the buckets.
- Step3 We have to download “ https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/driver_behavior.jar “ file for the upload in the Program folder.
- Step4 We have to download “ https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/detail-records.zip “ file for the upload in Program folder. Its name is detail-records including Spark sample data from.
Note: I had difficulty in loading data to the parallel file system via console. Therefore, I have successfully uploaded the data to my buckets with the “obsutil” solution in Ubuntu to load data. You can look to my teammate Feyza’s article. She tells very detailed Obsutil solutions.
For the article click to Link.
That’s all we need to do on the Bucket side.
2- Buy an MRS Cluster
- Step1 Now we have to buy an MRS Cluster.
Click on MRS Service on the service tree then click to Buy MRS cluster.
- Step2 Then we set the configuration parameters like in the picture below on quick Configuration section.
Region: AP-Singapore
Billing Mode: Pay-per-use
Cluster Name: mrs_Spark2x-demo
Version Type: Normal
Cluster Version: MRS 3.1.0
Cluster Type: Analysis Cluster (for offline data analysis)
AZ: AZ2
Enterprise Project: Default
- Step3 Continue to set parameters.
VPC: Select VPC for your MRS node’s network.
Subnet: Select the subnet your VPC.
Master Node Count: 2
Username: Admin
Password: Select Password for login to root.
Confirm your password
- Step4 Creating Job for the Cluster
we have to create a new job in to the creating cluster. Click on the cluster name and find Jobs section and click to this button. (Clusters > Active Clusters. On the displayed page, click the mrs_Spark2x-demo cluster).
- Step5 Set Job parameters like showing below.
Type: SparkSubmit
Name: driver_behavior_task
Program Path: Click OBS and select the driver_behavior.jar package uploaded in Program Folder.
Program Parameter: Select — class in Parameter, and enter com.huawei.bigdata.spark.examples.DriverBehavior in Value.
Parameters: Enter AK SK 1 Input path Output path.
Note: Output path should be a directory that does not exist, for example, obs://obs-demo-analysis-hwt4/output/
Service Parameter: Default
3- Viewing the Job Execution Results
- Step1 Go to the Jobs page to view the job execution status.
- Step2 Wait 1 to 2 minutes and log in to OBS console. Go to the output path of the creating bucket file system to view the execution result. Click Download in the Operation column of the generated CSV file to download the file to your local PC.
Conclusion
In this article, we aimed to analyze vehicle owners' data. We used MRS platform on Huawei Cloud. We understand the work principles of MRS service. Finally, we showed the analysis results. MRS is a very effective service for Big Data solutions.