Geek Culture
Published in

Geek Culture

In every IoT project, there is a preliminary phase called “Data Analysis” in which it is mandatory to use raw data already collected by logging systems, to demonstrate the real value that can be extracted from semi-structured data, using an Industrial IoT solution. No coding time is permitted, only data exploration or data manipulation can be used.

ADX Data Explorer and semi structured IoT data.

A real step-by-step example of data mining using real a dataset to demonstrate the power of Kusto.

The Initial context

The most difficult thing in a plant digitization project is to convince the customer that the digitization of machine data can actually bring considerable value and make people perceive things that cannot be seen with empirical tools, or with “traditional” industrial data management.
The act that can trigger interest is to immediately show the customer what the data, that is already somehow generated by the production lines and that is perhaps only cataloged as “raw data” (but never really used), are hiding.

This type of approach is based on the following points:
1. Machine data arrives and is readable
2. The data can be transported outside the plant to be cleaned, re-aggregated, filtered, using cloud services
3. The Data has something to say (this is fundamental, as often the customer DOESN’T KNOW what to look for in their data).

We are talking about a project-phase called “presale / pre-analysis / Data POC” based on data exploration (assuming you have a USB stick containing some sample files, and you have to highlight data value).
The service to be used on Azure for this type of situation is certainly ADX Data Explorer, as it is a tool that integrates Database storage, Data manipulation Engine, Data visualization using dashboarding elements.
In these cases the data engineer has a difficult task: having to perform a miracle starting from a file that is absolutely not interesting, at first glance.
But what is the correct approach, if there is one, to carry out this type of project?
A real method does not exist, instead there are many tools and tutorials on how, for example, you can manipulate data using KUSTO when using ADX Data Explorer.

The following is NOT a school tutorial on how to use KUSTO, but is a “practical test”, step by step, deliberately associated with the use of a non-encouraging dataset: a single log file where logs of different machines that make up a line have been written, with formats and standards that are not always the same, and certainly not already encapsulated in a table model with defined columns. The data does not contain precise time references but only a cyclic hourly time. There are no code records, but it is all to be found.
The article also contains the link to the dataset that you can download to personally test the queries indicated. Remember: this dataset, and therefore this experience, IS REAL.

File_test
| take 1000

Nothing Interesting.. but you see there is a source field in which you can see the source machine. So first, bring confidence with PARSE operator to to able to extract other informations.

File_test| where Source contains "/RT02/Silver/M2/"| parse kind = regex flags = Ui Text with * "VeloForceFree: " VeloForceFree ',' *| parse kind = regex flags = Ui Text with * "VeloForceFree: " VeloForceFree ',' *| parse kind = regex flags = Ui Source with * "/RT02/Silver/" MachineName '/' *

Ok, sounds better. So it’s time to find out the Machines.

File_test| parse kind = regex flags = Ui Source with "AT/" MachineName '/' *| distinct MachineName| order by MachineName

First surprise: There are messages that are coming from another communication level, not machine based. It’s time to go deeper, so it’s time to use “HASPREFIX” to clean out not-important rows, and to put a real datetime (look at the process: first calculate time increments, and then add Now()).

File_test| parse kind = regex flags = Ui Source with "AT/" MachineName '/' *| where MachineName == ''| where Text !hasprefix "E10"| where Text !hasprefix "User"| serialize rn = row_number()| extend _temp_timeslots = split(translate('.', ':', CreationTime), ':')| serialize _temp_millis = (_temp_timeslots[0]*60)+(_temp_timeslots[1])| extend prev_millis = prev(_temp_millis)| extend _temp_millis_offset = iff( (prev(_temp_millis) > 0) and (_temp_millis >= 0) and (prev(_temp_millis) < _temp_millis), _temp_millis - prev(_temp_millis), (3600 - prev(_temp_millis))+_temp_millis)| extend _progressive_offsset = row_cumsum(_temp_millis_offset)| extend relativedatetime = datetime_add('second',_progressive_offsset,now())| project-away MachineName, rn, _temp_millis, _temp_timeslots, prev_millis, _temp_millis_offset, _progressive_offsset

I discovered that there have been two recipe changes and one verification. HINT: You could understand what were the TIME SLOTS for each RECIPE VARIATION, and see the progress of pieces / rejects or problems / rejects searching in the other data.

Now what?

I stopped for a minute and tried to explore the data of a machine to understand how different the message format was. Reading the data carefully, I noticed that each message represented a state, so there was an exactly matching message which “closed the previous”. I understood that because half of the messages contained the same description, but with the prefix “CLEARED — “.
This led me to think of researching phenomena with timeslots (doors open for maintenance, duration of automation malfunctions, on / off alarms).

I took a machine and, after cleaning the glitches (events with a duration of less than one second) and the notification messages (logs with INFO level), I began to understand the data, finding, for example, the opening and closing doors actions(typical event for maintenance or for fixing problem produced in some cases)

File_test| parse kind = regex flags = Ui Source with "AT/" MachineName '/' *| where MachineName == "RT02"| where Text !hasprefix "Source(500)" and Text !hasprefix "Cleared - Source(500)"| where Text !hasprefix "Enable Axis" and Text !hasprefix "Cleared - Enable Axis"| where Text !hasprefix "Next MODULE" and Text !hasprefix "Cleared - Next MODULE"| where Text !hasprefix "Warning, Wafer" and Text !hasprefix "Cleared - Warning, Wafer"| where Text !hasprefix "Buffer" and Text !hasprefix "Cleared - Buffer"| where Text !hasprefix "Previous" and Text !hasprefix "Cleared - Previous"| where Text !hasprefix "Next" and Text !hasprefix "Cleared - Next"

I further cleaned the data by removing the “communication / ack” messages referring to the passing of products being processed between the previous and next machine, inside the line.

I began to see the value of the data.
However, the need arose to resolve some common traits before further exploration, namely:
1. parameterize the level of severity of the unstructured log
2. clean data from prev-next glitches / automation notifications / product passage

It was also necessary to convert to a datetime format starting from the cyclic TIME format. Obviously, this normalization operation brings with it two hypotheses:
# 1. if T2 <T1 then the time has changed implicitly (ex: T2 = 11th minute while T1 = 59th minute)
#2. the event order must be done by selecting for single device

So I decided to store the base query to start with through the use of a function. At the beginning composing the result in a let variable, hardcoding MachineId and BaseDateTime.

let startdate = now();let data = File_test| where Severity != "Info"| where Text !hasprefix "AmatPMon"| where Text !hasprefix "Source(500)" and Text !hasprefix "Cleared - Source(500)"| where Text !hasprefix "Enable Axis" and Text !hasprefix "Cleared - Enable Axis"| where Text !hasprefix "Next MODULE" and Text !hasprefix "Cleared - Next MODULE"| where Text !hasprefix "Warning, Wafer" and Text !hasprefix "Cleared - Warning, Wafer"| where Text !hasprefix "Buffer" and Text !hasprefix "Cleared - Buffer"| where Text !hasprefix "Previous" and Text !hasprefix "Cleared - Previous"| where Text !hasprefix "Next" and Text !hasprefix "Cleared - Next"| parse kind = regex flags = Ui Source with "AT/" MachineName '/' *| parse Text with "Cleared - " Cleared:string| where MachineName == "RT02"| serialize rn = row_number()| extend _temp_timeslots = split(translate('.', ':', CreationTime), ':')| serialize _temp_millis = (_temp_timeslots[0]*60)+(_temp_timeslots[1])| extend prev_millis = prev(_temp_millis)| extend _temp_millis_offset = iff( (prev(_temp_millis) > 0) and (_temp_millis >= 0) and (prev(_temp_millis) < _temp_millis), _temp_millis - prev(_temp_millis), (3600 - prev(_temp_millis))+_temp_millis)| extend _progressive_osset = row_cumsum(_temp_millis_offset)| extend relativedatetime = datetime_add('second',_progressive_osset,startdate)| project-away _temp_timeslots, _temp_millis, prev_millis, _temp_millis_offset, _progressive_osset;let pairedevents = data| project EventID, Severity, Source, Text, Cleared, Start=relativedatetime, relativedatetime| join kind=innerunique(data| project EventID, Severity, Source, Text, Cleared, Stop=relativedatetime, relativedatetime) on $left.Text == $right.Cleared| where datetime_diff('second', Stop, Start) > 0| project EventID, Severity, Source, Text, Start, Stop| order by Start asc, Stop asc ;let pairedevents2 = pairedevents| summarize make_list(Stop) by EventID, Severity, Source, Start, Text| project EventID, Severity, Source, Text, Start, Stop = todatetime(list_Stop[0])| extend Duration= Stop - Start;pairedevents2

Some notes on this difficult query:

  1. You must LEARN to make INNER Join with the same table
  2. Without a correlationID to couple Start and Stop Events, I’ve made a “quick&dirty” trick: a List for the stop events and picked up the first result as the “more recent” result.

And NOW.. a Function?

Now the Function time has come, in order to re-use the base query, parametrizing DateTime and MachineID:

.create-or-alter function with (docstring = "BaseQuery",folder = "Query") getDataByMachineId(mid:string, startdate:datetime) {let data = File_test| where Severity != "Info"| where Text !hasprefix "AmatPMon"| where Text !hasprefix "Source(500)" and Text !hasprefix "Cleared - Source(500)"| where Text !hasprefix "Enable Axis" and Text !hasprefix "Cleared - Enable Axis"| where Text !hasprefix "Next MODULE" and Text !hasprefix "Cleared - Next MODULE"| where Text !hasprefix "Warning, Wafer" and Text !hasprefix "Cleared - Warning, Wafer"| where Text !hasprefix "Buffer" and Text !hasprefix "Cleared - Buffer"| where Text !hasprefix "Previous" and Text !hasprefix "Cleared - Previous"| where Text !hasprefix "Next" and Text !hasprefix "Cleared - Next"| parse kind = regex flags = Ui Source with "AT/" MachineName '/' *| parse Text with "Cleared - " Cleared:string| where MachineName == mid| serialize rn = row_number()| extend _temp_timeslots = split(translate('.', ':', CreationTime), ':')| serialize _temp_millis = (_temp_timeslots[0]*60)+(_temp_timeslots[1])| extend prev_millis = prev(_temp_millis)| extend _temp_millis_offset = iff( (prev(_temp_millis) > 0) and (_temp_millis >= 0) and (prev(_temp_millis) < _temp_millis), _temp_millis - prev(_temp_millis), (3600 - prev(_temp_millis))+_temp_millis)| extend _progressive_osset = row_cumsum(_temp_millis_offset)| extend relativedatetime = datetime_add('second',_progressive_osset,startdate)| project-away _temp_timeslots, _temp_millis, prev_millis, _temp_millis_offset, _progressive_osset;let pairedevents = data| project EventID, Severity, Source, Text, Cleared, Start=relativedatetime, relativedatetime| join kind=innerunique(data| project EventID, Severity, Source, Text, Cleared, Stop=relativedatetime, relativedatetime) on $left.Text == $right.Cleared| where datetime_diff('second', Stop, Start) > 0| project EventID, Severity, Source, Text, Start, Stop| order by Start asc, Stop asc ;let pairedevents2 = pairedevents| summarize make_list(Stop) by EventID, Severity, Source, Start, Text| project EventID, Severity, Source, Text, Start, Stop = todatetime(list_Stop[0])| extend Duration= Stop - Start;pairedevents2}

Now my life was simpler: getDataByMachineId(“RT02”, now())

The PATTERN STATEMENT

Once the basic function has been built, I declined what were the parameterless middle tier operations that an external application could do.
Using the PATTERN STATEMENT I then built the correct queries, easily retrievable.

declare pattern app = (applicationId:string)[eventType:string]{("RT02").["Doors"] = {database("deltalake_example").getDataByMachineId("RT02", now())| where Text startswith "Doors"};("RT02").["GenericErrors"] = { database("deltalake_example").getDataByMachineId("RT02", now()) | where Text startswith "Error" };("RT02").["OperatorAlarms"] = {database("deltalake_example").getDataByMachineId("RT02", now())| where Text startswith "Operator alarm"| extend Source2 = strcat(Source, "/A/")| parse kind = regex flags = Ui Source2 with "AT/" MachineCode:string "/Silver/" MachineId:string "/Sides/" Side:string "/OpAlarms/" Alarm:string "/A/" *| extend description = strcat(Alarm, replace("Operator alarm", "", Text) )| extend replaced=replace(@'Operator alarm', @'', Text)| project EventID, Severity, Source, Text, Start, Stop, Duration, MachineCode, MachineId, Side, Alarm, description};("RT02").["ProductTransitionErrors"] = { database("deltalake_example").getDataByMachineId("RT02", now()) | where Text startswith "wafer transition" };("RT02").["AxisErrors"] = { database("deltalake_example").getDataByMachineId("RT02", now()) | where Text hasprefix "Axis" and Text hasprefix "Error" };};

And now I was able to retrieve useful insights from data, for example an Operator Alarm with the highest incidence (> 36k rejects caused by a wrong alignment):

app("RT02").["OperatorAlarms"]| summarize AlarmAmount = sum(Duration) by Alarm, description| order by AlarmAmount desc| render columnchart

Next steps

The could be next steps:
1. categorize event types, assigning a CODE, based on a reference table, in order to simply search and query composition in the pattern declaration phase
2. Produce a multi line chart dashboard where a user can select MACHINE, Event type, time slices, event types / alarm codes

The Result

The general result is that you can demonstrate valuable information simply using a log file.
With this approach is easy to add more value with few effort: For example, through the same approach it is possible to extend the number of KPIs discussing shift-by-shift with the customer in a interactive data-driven way (remember that He knows the process, so he can associate “cause” and “effect”).

Recap

In this post I explained to you how to approach a data analysis using KUSTO and Data Explorer first capability: Explore data. I found this approach a quick win when you’re customer want to digitalize a plant without knowing what is useful to search in the data.

Note: Here is the dataset link to test all the queries.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Riccardo Zamana

Riccardo Zamana

17 Followers

I’m a management professional with 20 years of experience, skilled in delivering business results by creating tailor made Cloud and IOT based solutions.