Model Selection Using Ray

Published in

juniper-team

6 min readJan 22, 2021

Authors: Pooja Ayanile, Divyank Garg, Ajit Patankar, Sabyasachi Mukhopadhyay, Subhabrata Banerjee

Introduction

Model selection involves training multiple models on the same data set and comparing the resulting output and then selecting the best optimal model based criterias such as execution time and performance. Generally, model selections processes are of the same type for both machine learning and deep learning models.

Model Selection is a very important stage in a ML pipeline to make better inferences and future strategies but it is also one of the time consuming tasks. Due to the repetitive nature of Model Training and Testing, we can take advantage of Parallel Processing. As in the previous blog we migrated the legacy python to Ray code by using Ray remote functions and actors. Similarly Ray remote function can be used to utilize parallelism for the model selection

Fig -1 : Traditional Serial Approach used Training and Testing different models

As shown in Fig-1, in serial execution each successive model executes only after the previous model completes the execution. Consequently all models are running on one only core and it means the time taken for selecting a model will be the sum of individual model processing time. Ray executes the training and testing tasks in parallel for each model on all the available cores across the cluster. This configuration is depicted in Fig-2. In this we are taking advantage of the Ray cluster, individual models are getting distributed to each core within the cluster and are being processed independently. Thus, the total time would be the time taken by the longest running model.

Fig-2 : Parallel model selection using Ray

Case Studies

In this section, we benchmark Ray using two use cases using traditional ML and Deep Learning models.

1. Evaluation for Machine learning models.

As explained above, we use Ray to train multiple models and evaluate the overall performance. In our use case the training data set has approximately 200K points but is highly imbalanced (2% minority class). Thus, the data set is pre-processed to adjust for imbalance nature and finally a training data set of size approximately 30K rows and 10 features is created. In the model selection phase, the following Machine learning models are evaluated: Logistic regression, SVC, Gaussian, Random Forest, Ada Boost, and Linear Discriminant Analysis (LDA).

Sample code for this implementation is shown below.

# Define a Model dictionary having model name as key and Model Constructor as string in value. This approach prevents the execution of model constructors in dictionary declaration time.Model_dict = {
 “LogisticRegression”:”LogisticRegression(random_state=0)”,
 “SVC”:”SVC(gamma=’auto’,random_state=0)”,
 “GaussianNB”:”GaussianNB()”,
 “RandomForestClassifier”:”RandomForestClassifier(random_state=0)”,
 “AdaBoostClassifier”:”AdaBoostClassifier(random_state=0)”,
 “LinearDiscriminantAnalysis”:”LinearDiscriminantAnalysis()” 
}

Model Selection — Serial Approach

def Train_Test_Model(name,X_train,y_train,X_cv,y_cv, X_test,y_test):
 “””
 It takes model name (sklearn compatible) and data and train the model
 It returns dictionary having model name, accuracy as well as other scores and model params
 “””
 st = datetime.datetime.now()
 model = None
 if name not in Model_dict:
 raise Exception(“Provided Model Name is not in Model_dict”)
 model = eval(Model_dict[name])
 
 # train model
 model.fit(X_train,y_train)
 
 # predict
 y_train_predict=model.predict(X_train)
 y_cv_predict=model.predict(X_cv)
 y_test_predict=model.predict(X_test)
 
 
 # accuracy
 train_accuracy=accuracy_score(y_train, y_train_predict)
 cv_accuracy=accuracy_score(y_cv, y_cv_predict)
 test_accuracy=accuracy_score(y_test, y_test_predict)
 
 # macro f1 score
 train_f1_score_macro=f1_score(y_train, y_train_predict,average=’macro’)
 cv_f1_score_macro=f1_score(y_cv, y_cv_predict,average=’macro’)
 test_f1_score_macro=f1_score(y_test, y_test_predict,average=’macro’)
 et= datetime.datetime.now()
 result_dict = dict()
 result_dict[“model_name”]=name
 result_dict[“train_accuracy”]=train_accuracy
 result_dict[“cv_accuracy”]=cv_accuracy
 result_dict[“test_accuracy”]=test_accuracy
 result_dict[“train_f1_score_macro”]=train_f1_score_macro
 result_dict[“cv_f1_score_macro”]=cv_f1_score_macro
 result_dict[“test_f1_score_macro”]=test_f1_score_macro
 result_dict[“start_time”]=st
 result_dict[“end_time”]=et
 print(“check”,st)
 return result_dictdef Train_Test_Model(name,X_train,y_train,X_cv,y_cv, X_test,y_test):

The time taken by entire stage is following-

Fig — 3: Gantt Chart for Serial Execution without using Ray

Model Selection — Parallel Approach using RAY

@ray.remote
def Train_Test_Model(name,X_train,y_train,X_cv,y_cv, X_test,y_test):
 “””
 It takes model name (sklearn compatible) and data and train the model
 It returns dictionary having model name, accuracy as well as other scores and model params
 “””
 
 st = datetime.datetime.now()
 
 model = None
 
 if name not in Model_dict:
 raise Exception(“Provided Model Name is not in Model_dict”)
 
 model = eval(Model_dict[name])
 
 # train model
 model.fit(X_train,y_train)
 
 # predict
 y_train_predict=model.predict(X_train)
 y_cv_predict=model.predict(X_cv)
 y_test_predict=model.predict(X_test)
 
 
 # accuracy
 train_accuracy=accuracy_score(y_train, y_train_predict)
 cv_accuracy=accuracy_score(y_cv, y_cv_predict)
 test_accuracy=accuracy_score(y_test, y_test_predict)
 
 # macro f1 score
 train_f1_score_macro=f1_score(y_train, y_train_predict,average=’macro’)
 cv_f1_score_macro=f1_score(y_cv, y_cv_predict,average=’macro’)
 test_f1_score_macro=f1_score(y_test, y_test_predict,average=’macro’)
 
 et= datetime.datetime.now()
 
 result_dict = dict()
 result_dict[“model_name”]=name
 result_dict[“train_accuracy”]=train_accuracy
 result_dict[“cv_accuracy”]=cv_accuracy
 result_dict[“test_accuracy”]=test_accuracy
 result_dict[“train_f1_score_macro”]=train_f1_score_macro
 result_dict[“cv_f1_score_macro”]=cv_f1_score_macro
 result_dict[“test_f1_score_macro”]=test_f1_score_macro
 result_dict[“start_time”]=st
 result_dict[“end_time”]=et
 print(“check”,st)
 return result_dict

The execution time with Ray is shown in Figure 4.

Figure 4: Gantt chart for parallel execution using Ray

Analysis of Results

Ray enables us to parallelize tasks which results in less turnaround time for experiments in the Model Selection stage. Comparison of Fig -3 and Fig -4 shows that if parallel execution is used then all the model training tasks (subject to the availability of cores) are started at the same time and a single long running task does not delay all other tasks. In serial execution, all the models are trained one after the other sequentially and the total time taken for training of all the models is the sum of the time taken to train individual models.

2. Evaluation for Deep learning models

In the previous section, multiple machine learning models were compared using ray and without ray and validation accuracy and total were benchmarked. The same concept is used to compare the validation loss and total execution time for deep learning models. Here three DL models were considered — vanilla LSTM, Dense LSTM, and Bi-directional LSTM. We used a univariate time series data of approximately 100K length. This data was collected from an actual production network device but randomized to some extent.

We need to find the best model among these three models as we did earlier for machine learning models. In machine learning model selection the function created for model selection is defined under the Ray remote function and this remote function is executed at the back end as a task. These tasks are performed in workers and these workers are the cores within the Ray cluster and thus tasks are parallel distributed within workers. In case of deep learning we used Ray actors to parallelize the model and reduce the processing time. Ray provides actors to parallelize an instance of a class. When class is instantiated that is a Ray actor, Ray will start a remote instance of that class in the cluster. This actor can then execute remote method calls and maintain its own internal state. Then those remote method functions distribute tasks to the Ray cluster and execute the task. This functionality helps in parallelizing models to Ray cluster and reduces total process time along with little increase in accuracy.

The code using Ray actors for model selection is shown below:

@ray.remote
class Network(object):
 def __init__(self,name,X_train, X_val, y_train, y_val):
 if name==’dense_lstm’:
 self.model = dense_lstm()
 if name==’bidirection_lstm’:
 self.model=bi_directional_lstm()
 if name==’univariate_lstm’:
 self.model=univariate_lstm()
 self.X_train = X_train
 self.X_val = X_val
 self.y_train=y_train
 self.y_val=y_valdef train(self):
 history_dict=dict()
 history = self.model.fit(self.X_train, self.y_train,epochs=20,verbose=1,steps_per_epoch = 10, validation_data = (self.X_val, self.y_val),validation_steps = 5)
 history_dict[‘history’+str(self.model)]=history
 return history.historytry_list=[]
model_lstm_lst=[‘dense_lstm’,’bidirection_lstm’,’univariate_lstm’]
for name in model_lstm_lst:
 actor=Network.remote(name ,ray_X_train, ray_X_val, ray_Y_train, ray_Y_val)
 try_list.append(actor.train.remote())
print(ray.get(try_list))

Total time comparison is shown in Figure 5.

Fig — 5: Comparing times for training multiple Deep learning models with and without Ray

The total time for training multiple DL models using Ray actors is less compared to using python functions because Ray actors distribute the individual model computation to individual workers within the cluster. It distributes the task within workers and performs parallel computation compared to series computation of python.

Conclusions:

Ray actors and remote function help in parallel computation of model selection processes and thus can be easily used for fast computing and better results. Once model selection is done then we need to use the model to do some prediction and get output. In next blog series we have tried using Ray to optimize the both machine learning and deep learning models using Ray Backend and Ray SGD respectively

Previous Blog Reference: https://juniper-team.medium.com/migrating-legacy-code-to-ray-213e1ee54a40