Control Rescale Persistent Cluster with REST API

as Preparation for Design Optimization and Model-Based Design

Takahisa Shiratori
3 min readSep 24, 2018

日本語の記事はこちら

Background

Design optimizations and model-based design (MBD) are now more and more utilized in manufacture industry. Cloud computing is an effective way to run CAE analyses for these methodologies because multiple analyses with various conditions can be run simultaneously on its scalable infrastructure. Here it is important to run many jobs whose duration is not so long, and do trial-and-error many times in a short term.

What is persistent cluster, and why should it be used?

Realizing the requirement above with Rescale, we have to take care about the time to launch clusters. In default, a cluster is launched every time when a job is submitted to Rescale. It takes a few minutes to launch, and also to terminate the cluster. This time for launch and termination occupies significant percentage of the whole analysis flow when the jobs are finished in a few minutes, and iterated many times while varying conditions.

In order to reduce the time for launch and termination, persistent cluster can be used. With this feature, clusters “persist” after the job is completed, so we can submit another job.

We have to control persistent clusters with API to utilize it from optimization and MBD tools. In this article, I use Rescale REST API to launch a cluster, submit jobs, and shutdown it with a script written in Python.

Try it with a sample script

A sample script and related files are available here (GitHubGist). We’ll use the following four files.

persistentClusterSample.py : Main script

testCluster.json : Configuration of cluster to be launched

firstJob.json : Configuration of the job firstly submitted

secondJob.json : Configuration of the job secondary submitted

Save these files in the same directory, and alter “your-api-token” in persistentClusterSample.py to your Rescale API Key. Please alter “ platform.rescale.jp” in 10th line to the platform you use, for example “ platform.rescale.com”, “ eu.rescale.com” and so on. Then let’s execute persistentClusterSample.py. The following is an example of standard output when the script is run on Amazon Linux 2 (ami-04681a1dbd79675a5, us-east-1).

$ ls
firstJob.json persistentClusterSample.py secondJob.json testCluster.json
$ vim persistentClusterSample.py
$ python persistentClusterSample.py
Cluster ID: fWrPdb
Status date: 2018-09-23T14:37:11.198000Z, Status: Starting
Status date: 2018-09-23T14:37:07.983010Z, Status: Queued
Status date: 2018-09-23T14:37:07.608333Z, Status: Pending
Status date: 2018-09-23T14:37:06.513055Z, Status: Not Started
Job ID: wsGjp
Job ID: uaOxd
Job wsGjp is submitted to the cluster fWrPdb
Job uaOxd is submitted to the cluster fWrPdb
[2018/09/23 14:50:19] First Job: Completed, Second Job: Completed
Status date: 2018-09-23T14:50:20.684000Z, Status: Stopping
Done

On Rescale’s web UI, the cluster’s status can be seen as following.

According to this,

  • It took two minutes to launch the cluster. (Cluster — Starting)
  • It took five minutes to complete the first job. (First Job — Executing)
  • The second job started running just after the first job was completed.
  • It took five minutes to complete the second job. (Second Job — Executing)

It means multiple jobs were executed without launching clusters for every job.

--

--

Takahisa Shiratori

Interested in cloud computing and fluid dynamics, Ph.D. in Engineering, My opinions are my own