SIDEDISHES

Published in

GLInB

7 min readJul 6, 2020

WARM SALAD

That is a crazy COVID-19 summer as crazy virus driving people crazy. Dizzy sun burns and sizzles. My “ReadBoMasDoc” project is on the way to delivery so we can spend some more effort on exploring the project “Gesture Recognition for Workmanship”. This time we are leveraging the free workspace of Azure ML Studio (classic) to see if it’s truly convenient and what we learn from data collected.

Here for certain reasons, I won’t address the data collection process in detail, what we have here are four sets of data which were named as “Driver-1”, “RS-485–1”, “Wire-1” and “12V-1”. Each one is composed of 128 columns which represent variables either collected or calculated from signal of one IMU (Inertial Measurements Unit). Let’s go step by step with the visual workflow to drill down.

As it’s a workspace, it seems the Python script module can only allow two dataset input. Take “12V-1” as an example, right click and choose “dataset” then “Visualize” and you can have an overview of the dataset, like number of rows and columns as well as the descriptive statistics of each variable. Probably because of the free Workspace, so there are only two inputs for the Python script module, I can have combine them one by one.

# The script MUST contain a function named azureml_main
# which is the entry point for this module.
# imports up here can be used to 
import pandas as pd
# The entry point function can contain up to two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(DataFrame1, DataFrame2):
    # Execution logic goes here
    #print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1))
    #print('Input pandas.DataFrame #2:\r\n\r\n{0}'.format(dataframe2))
    # If a zip file is connected to the third input port is connected,
    # it is unzipped under ".\Script Bundle". This directory is added
    # to sys.path. Therefore, if your zip file contains a Python file
    # mymodule.py you can import it using:
    # import mymodule
    
    # Return value must be of a sequence of pandas.DataFrame
    combined_csv=pd.concat([DataFrame1,DataFrame2])
    print(combined_csv.describe())
    return combined_csv

Basically those signals from IMU would be the acceleration for each direction as well as other rotation motion and orientation like pitch, yaw, etc.. So we have to do some data pre-treatment work also the reduction of dimensionality. We are going to use Principal Component Analysis to see if we can transform original vector space to a new space by its eigenvectors with less dimensions. One thing needs to be reminded that since PCA is based on the eigenvectors set which variation is least, data should be normalized first before PCA. In fact I did mistake during my analysis but the value in correlation matrix and the contribution of eigenvalue did’t vary. We leverage SAS princomp proc for hands-on play to decide what are the data we will put in model and one example attached below:

/** Import the CSV file.  **/
PROC IMPORT DATAFILE="/folders/myfolders/MyBigData/combined_csv12VOK3_0502.csv"
		    OUT=WORK.MYCSV
		    DBMS=CSV
		    REPLACE;
RUN;
/** Print the results. **/
PROC PRINT DATA=WORK.MYCSV; RUN;
PROC SQL;
   /*CREATE TABLE work.query1 AS*/
   SELECT ROM3 , AccZ_0, YAW_0, AccZ_3, FROM work.mycsv 
   WHERE VAR1 BETWEEN 261 AND 280; *//*only within spec. data saved*/
   QUIT;
   
ods noproctitle;
ods graphics / imagemap=on;
proc princomp data=WORK.MYCSV plots(only)=(scree) plots= score(ellipse ncomp=5);
	var ROM3 AccZ_0 YAW_0 AccZ_3;
run;

We took “Normalize Data” in “Scale and Reduce” directory of “Data Transformation” category and chose “ZScores” method and selected our interested columns for PCA analysis.

Because there is less analytic tool like SAS provide in Azure ML Studio “Principal Components Analysis” module, so I cross referred with SAS princomp procedure result like scree plot and correlation matrix to decide how many components we were going to leave in our model and which variables for PCA. In our case, we had 47 variables and totally 1,346 rows of data and reduced to 8 components which could explain around 90% of variation for model training and clustering.

SOMETHING HOT

Now it’s time to prepare for clustering. As there is no way to download data locally for additional analysis by Azure ML studio. During the exploratory data analysis phase, it’s a disadvantage of this hands-on tool. Both Python Scikit-Learn library and SAS has K-means clustering So we tried both and make a comparison.

from sklearn.decomposition import PCA
pca = PCA(n_components=8)
pca.fit(combined_csv3)
pca.components_
pca.get_params()
kmeans = KMeans(n_clusters = 7, init='k-means++', max_iter = 300, n_init = 10, random_state =20)
a=kmeans.fit(combined_csv4).labels_
print(a)

You might have the question how we choose “n_clusters” as 7 in above code, that’s because we used to use wcss (Within-Cluster-Sum-of-Squares) and put it on elbow plot then judged visually. It might not be the best analytic way but surely good enough. As mentioned before, we can use sas and the code looks below:

ROC IMPORT DATAFILE="/folders/myfolders/MyBigData/Clustered_PCA_0517.csv"
		    OUT=WORK.MYCSV
		    DBMS=CSV
		    REPLACE;
RUN;
ods noproctitle;
proc fastclus data=WORK.MYCSV maxclusters=7 cluster=Cluster1 out=work.Fastclus_scores; 
	var PCA0 PCA1 PCA2 PCA3 PCA4 PCA5 PCA6 PCA7;
	
run;proc sql noprint;
	create table work.combine as select a.*, b.* from WORK.MYCSV as a, 
		WORK.FASTCLUS_SCORES as b where a.Time=b.Time;
quit;
ods graphics / reset width=6.4in height=4.8in imagemap;
proc sort data=WORK.COMBINE out=_SeriesPlotTaskData;
	by Time;
run;
proc sgplot data=_SeriesPlotTaskData;
	title height=14pt "Clustered by SAS";
	series x=Time y=Cluster1 /;
	xaxis grid;
	yaxis grid;
run;
ods graphics / reset;
/*proc datasets library=WORK noprint;
	delete _SeriesPlotTaskData;
	run;*/
proc sgplot data=_SeriesPlotTaskData;
	title height=14pt "Clustered by Python";
	series x=Time y=Cluster /;
	xaxis grid;
	yaxis grid;
run;
ods graphics / reset;
proc datasets library=WORK noprint;
	delete _SeriesPlotTaskData;
	run;

And now it’s time to back Azure ML Studio for rest of work. We initialize an “K-means Clustering” model in “Machine Learning” category and choose the same “K-Means++” in right hand pane as attached, and then use this model and data set to “Train Clustering Model” and we complete the training part of this experiment. And before we set up web service, we have to execute “Run” to finish this part and after execute web service set up the “Predictive experiment” can be available automatically.

There are several ways to run predictive experiment on web service, you could run it on web UI directly as shown below, but it’s quite tedious since there are so many variables which need to be input. Surely you can use API key to query this model by coding but it’s more effort and knowledge required. The convenient Excel workbook equipped is a very helpful function especially with high dimensional problem like the example here. I like to embed it to Microsoft Teams so it could be shared with my team easily and comprehensive UI for them to pick up. You simply paste the data in the Excel and specify where the result should be presented (like IW1 in below Excel since results with header and the data has been classified as cluster “0”.)

Now let’s get back to see how the result of each tool and what cluster they are. I export the data from Excel and use PowerBI to visualize it with SAS result as attached (Blue line is by SAS and Red line is by Azure ML Studio). Since different tools and I don’t change the number of clusters but you can see in general the trend would be similar and quite aligned. The waveform subjects to the motion and gesture of these four groups of dataset, for example, motion 1 (0~384pt) would be a stable holding position and motion 2 (385 to 631pt) and 4 (1085 to 1346pt) would be very resemble except for the delicate fingers relative positions.

You also can visual the clusters on PCA coordination by “Trained Clustering Model” as attached, however it’s too simple to come out more insight so I put it on PowerBi so the dispersion is more clear to distinguish.

A PIECE OF CAKE

So far it looks buffet without main course, but sometimes even supper for a light meal can serve and nourish after one day’s retiring load. Imaging a fascinating creme brulee can make the rest of night like in paradise, right? Also looking forward to welcoming another new journey when sun rises. Here I have my dessert bit and waiting to be enjoyed.

The mission of GLInB is to bring most value by virtualization of your supply chain quality function to fit for challenges in today’s business environment.

Please visit us at http://www.glinb.com for more information.

SIDEDISHES

WARM SALAD

SOMETHING HOT

A PIECE OF CAKE

Written by Michael C.H. Wang