My first experiment with Azure ML Studo Part 2 : Simulation

Sambhu Surya Mohan
3 min readApr 4, 2018

Azure ML Studio is really good. I was able to do a small experiment with it and was still organized in my work. I had to get help of python for some complex task though.

In continuation to my earlier post I decided to do a simulation to compute the profit of retail shop using the original statistical data received from it. Its just a random experiment to prove how close the simulations and data distributions (computed using statistical method) is to the original real world data.

The figure shows my Azure ML Studio flow. After pre-processing the data (as discussed in Part 1) I computed the profit and profit percentage. As in the figure I could do it using parallel mathematical operations. I needed to find the profit on a per day basis. I wasn’t able to find a component for it so made a small python script of 2 statements to compute that. The next python script was to use the statistical data of the computed profit per customer in a day(I will call it PPCD) and the customer count per day(I will call this CCPD). Both PPCD and CCPD shows a almost normal distribution curve(Central limit theory kicks in ;) ).

For the simulation, using the statistical data computed from PPCD and CCPD a new set of profit data(I call it computed profit or CP for short) was generated for the 365 days. The data was generated randomly from a normal distribution with mean and stddev of the PPCD and data generated from a poisson distribution with mean of CCPD. The data was for 365 days.

Finally analyzing the statistics for both the origin profit in 365 days to the computed profit for the same number of days shows very small differences. The mean of the both the Profit and CP is almost the same(plus 6 difference). The std dev showed a difference in a minus 1K. The corresponding minimum and maximum value has a difference of 2K.

As a last analysis I verified the total profit of the year using both original profit and computed profit. The difference came around a plus 20K. I think the simulation really approximated the original data since a 20K difference is less than 0.5% of the actual profit amount.

Since I used the same mean and std dev of parts of the original data its no wonder that the result would be more or less same. But there was no direct relation of original profit to the computed profit. The simulation result was not way off the original data. It was less than 0.5% error in the data I had. May be next time I can take statistics of a sub sample of the data I have and use that in the experiment and verify.

Azure ML is a really good tool to get you started. I think it needs more functions and hopefully can avoid writing python scripts. It is a bit slow on the python execution side(may be because I was using the free version). But its fast with its components. Parallel computations are also supported with subscription I was using. Since I was using it for the first time it took me quite sometime to figure out the components. May be around 3–4 hours for the whole task. I had to switch between python and components multiple times. Most times I wrote a python script then found out that a combination of components could do the whole trick. Hopefully in coming upgrades azure would support complex statements to.

Part 1 of the article will give you an overview of the pre-processing stage.

--

--