Using BigDL for distributed Deep Learning on Telefónica Open Cloud’s MRS Service

BigDL

Running BigDL jobs on Telefónica Open Cloud MRS Service

ssh -i xxxxxxx-key-xxx.pem linux@66.XXX.XXX.XXX
[linux@node-master2-jZbrd ~]$ sudo su - omm
Last login: Mon Feb 19 10:53:25 EST 2018
[omm@node-master2-jZbrd ~]$ source /opt/client/bigdata_env
[omm@node-master2-jZbrd ~]$
[omm@node-master2-jZbrd ~]$ spark-submit --master yarn --deploy-mode cluster --executor-cores 12 --num-executors 9 --class com.intel.analytics.bigdl.models.lenet.Train hdfs://hacluster/user/bigdl-libs/bigdl-SPARK_2.1-0.4.0-jar-with-dependencies.jar -f hdfs://hacluster/user/data/ -b 108 --checkpoint hdfs://hacluster/user/model/
.......
2018-02-19 11:21:50,044 | INFO | main | Application report for application_1513687505375_0045 (state: FINISHED) | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2018-02-19 11:21:50,044 | INFO | main |
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.13.0.242
ApplicationMaster RPC port: 0
queue: default
queue user: omm
start time: 1519056168591
final status: SUCCEEDED
tracking URL: http://node-master1-dhWYt:26000/proxy/application_1513687505375_0045/
user: omm | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2018-02-19 11:21:50,062 | INFO | pool-1-thread-1 | Shutdown hook called to kill application. | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2018-02-19 11:21:50,068 | INFO | pool-1-thread-1 | Killed application application_1513687505375_0045 | org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:415)
2018-02-19 11:21:50,069 | INFO | pool-1-thread-1 | Shutdown hook called | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2018-02-19 11:21:50,071 | INFO | pool-1-thread-1 | Deleting directory /tmp/spark-52f71634-c501-4a90-bacc-1190e729ed9c | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
[omm@node-master2-jZbrd ~]$ hdfs dfs -ls /user/model
Found 5 items
drwxr-xr-x - omm hadoop 0 2017-12-27 15:43 /user/model/20171227_152442
drwxr-xr-x - omm hadoop 0 2017-12-27 17:24 /user/model/20171227_155518
drwxr-xr-x - omm hadoop 0 2018-02-06 07:44 /user/model/20180206_060325
drwxr-xr-x - omm hadoop 0 2018-02-06 09:47 /user/model/20180206_090811
drwxr-xr-x - omm hadoop 0 2018-02-19 11:21 /user/model/20180219_110333
[omm@node-master2-jZbrd ~]$ hdfs dfs -ls /user/model/20180219_110333
Found 30 items
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:06 /user/model/20180219_110333/model.1113
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:07 /user/model/20180219_110333/model.1669
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:08 /user/model/20180219_110333/model.2225
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:09 /user/model/20180219_110333/model.2781
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:11 /user/model/20180219_110333/model.3337
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:12 /user/model/20180219_110333/model.3893
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:13 /user/model/20180219_110333/model.4449
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:14 /user/model/20180219_110333/model.5005
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:15 /user/model/20180219_110333/model.5561
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:04 /user/model/20180219_110333/model.557
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:17 /user/model/20180219_110333/model.6117
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:18 /user/model/20180219_110333/model.6673
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:19 /user/model/20180219_110333/model.7229
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:20 /user/model/20180219_110333/model.7785
-rw-rw-rw- 3 omm hadoop 186541 2018-02-19 11:21 /user/model/20180219_110333/model.8341
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:06 /user/model/20180219_110333/optimMethod.1113
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:07 /user/model/20180219_110333/optimMethod.1669
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:08 /user/model/20180219_110333/optimMethod.2225
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:09 /user/model/20180219_110333/optimMethod.2781
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:11 /user/model/20180219_110333/optimMethod.3337
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:12 /user/model/20180219_110333/optimMethod.3893
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:13 /user/model/20180219_110333/optimMethod.4449
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:14 /user/model/20180219_110333/optimMethod.5005
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:15 /user/model/20180219_110333/optimMethod.5561
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:04 /user/model/20180219_110333/optimMethod.557
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:17 /user/model/20180219_110333/optimMethod.6117
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:18 /user/model/20180219_110333/optimMethod.6673
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:19 /user/model/20180219_110333/optimMethod.7229
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:20 /user/model/20180219_110333/optimMethod.7785
-rw-rw-rw- 3 omm hadoop 1678 2018-02-19 11:21 /user/model/20180219_110333/optimMethod.8341
[omm@node-master2-jZbrd ~]$ spark-submit --master yarn --deploy-mode cluster --executor-cores 12 --num-executors 15 --class com.intel.analytics.bigdl.models.lenet.Test hdfs://hacluster/user/bigdl-libs/bigdl-SPARK_2.1-0.4.0-jar-with-dependencies.jar -f hdfs://hacluster/user/data/ -b 180 --model hdfs://hacluster/user/model/20180219_110333/model.8341
[omm@node-master2-jZbrd ~]$ yarn logs --applicationId application_1513687505375_0049 | grep container_1513687505375_0049
18/02/20 03:47:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 40
Container: container_1513687505375_0049_01_000004 on node-core-BzsKm_26009
Container: container_1513687505375_0049_01_000001 on node-core-qefwB_26009
Container: container_1513687505375_0049_01_000002 on node-core-rbIww_26009
[omm@node-master2-jZbrd ~]$ yarn logs --applicationId application_1513687505375_0049 -containerId container_1513687505375_0049_01_000001 --nodeAddress node-core-qefwB_26009
18/02/20 03:47:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 40
LogType:gc.log.0.current
Log Upload Time:Tue Feb 20 03:00:37 -0500 2018
LogLength:3725
Log Contents:
....
End of LogType:stderrLogType:stdout
Log Upload Time:Tue Feb 20 03:00:37 -0500 2018
LogLength:2699
Log Contents:
Top1Accuracy is Accuracy(correct: 9869, count: 10000, accuracy: 0.9869)
....
End of LogType:stdout
[omm@node-master2-jZbrd ~]$ spark-submit --master local[*] --class com.intel.analytics.bigdl.models.lenet.Test ./TEST/bigdl-SPARK_2.1-0.4.0-jar-with-dependencies.jar -f ./data/ -b 8 --model ./TEST/model.8341
2018-02-19 12:31:09,466 | INFO | main | Set mkl threads to 1 on thread 1 | com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$setMKLThread$1$$anonfun$apply$1.apply$mcV$sp(ThreadPool.scala:79)
2018-02-19 12:31:11,841 | INFO | main | Auto detect executor number and executor cores number | com.intel.analytics.bigdl.utils.Engine$.init(Engine.scala:99)
2018-02-19 12:31:11,842 | INFO | main | Executor number is 1 and executor cores number is 8 | com.intel.analytics.bigdl.utils.Engine$.init(Engine.scala:101)
2018-02-19 12:31:11,846 | INFO | main | Find existing spark context. Checking the spark conf... | com.intel.analytics.bigdl.utils.Engine$.checkSparkContext(Engine.scala:292)
Top1Accuracy is Accuracy(correct: 9869, count: 10000, accuracy: 0.9869)

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade