Snowball effect in industry 4.0

As members of Unit8, a Swiss-based startup, we are solving customers’ challenges from various domains. One of them is manufacturing. Without deep domain knowledge, over the course of three months, we developed a solution to improve overall equipment effectiveness by 15% and by 50% reaction time to an incident. English translation — 100s components at the factory are able to increase production by 15% and factory operators could react to any incident way faster, which obviously had a substantial financial impact. Here we are presenting key lessons we learned from the project that can be applied in many other use cases.

Industry 4.0

Before we begin let’s explain what the term Industry 4.0 stands for. Industry 4.0, also known as the fourth industrial revolution, is a current trend in manufacturing. After mechanisation (1st) through the introduction of assembly lines (2nd) and automation (3rd), it is time to leverage Internet of Things, massive data collection and cloud computing.

History of industrial revolutions


The environment we face in this case is a chemicals factory. There are plenty of manufacturers for whom production is already set up. where most of the job is done by machines usually just repeating a given set of steps — and yet the setup is often not optimal!

If we take a closer look, we can identify that many factories have very similar specifics. They all consist of separate workstations that do a specific task and are organised into assembly lines. Every single step on this line is associated with one of those workstations executing a task that brings the product closer to a final state. A lot of things can happen during this process — this is why producers of each of those working machines equip them with a ton of sensors that gather real-time metrics. At the same time, we need to assure the quality of products, and this is why we get another set of measurements for objects properties, like size, shape, weight, consistency etc.

A closer look at the factory

Like you can see, we have quite a lot of data coming out that is not too easy to interpret at the first sight — different producers publish different metrics, hundreds of thousands of produced objects vary and it’s very difficult to get a grasp of what is happening out there and see if by any chance we aren’t underperforming. This can lead to stalls on the production line and huge losses! This is why we decided to do some number crunching for our customer.

Ideal state

All metrics are being stored is some kind of SQL database. Hundreds of tables are optimal from a developer point of view. Those tables were designed to easily query predefined logic. Unfortunately, if you would like to get some additional insights into the factory, it becomes more troublesome. We would like to create an interface which is more readable, more concrete. So we asked line operators, which are customers of our interface, what kind of insights are they interested in and what should be included in such an interface. And this is what we heard.

Using new manufacturing interface, I would like to optimize our lines’ production

There are many definitions of being optimal, which one are we talking about? Is it resource allocation, costs, speed?
 This is the beginning of a story how we managed to improve production speed by 15% which positively impacted the Overall Equipment Effectiveness.

Day 1

We are coming to the factory with the hope that interviews with users will shed some light on a topic. After spending the first hour with users, we can come to the first obvious conclusion.

Excel — It is everywhere

Excel is their main tool. There is a gigantic stack of spreadsheets containing data extracts, custom analysis, some charts. Please don’t get us wrong, we think Excel is a great tool with endless possibilities, but it might not be an appropriate solution for certain problems. Data extracts become outdated quickly, custom analysis contains the author’s definitions instead of consistent, agreed metrics. Printed charts, put in operations room becoming nothing more than a nice picture on the wall within a week.

All right we can see some potential improvement, but we would like to identify the current biggest concern for line operators. Both sides agreed that we should focus on checking the performance of single machines. Let’s get to work.

First iteration

After spending a week or so, we find some interesting things already worth sharing.
 When you are looking at the effectiveness of producing one item, nothing seems concerning. No stops, items are being delivered before the deadline. But when you take a single machine, doing exactly one thing, over a period of one year, you can see that performance is degrading over time. Machines are being utilised in various ways and some of them need to be checked more often, rather than keeping the same tied schedule for maintenance for all of them. 
 Moreover, to speed up development and improve maintainability we switched from Excel to R.
 The first step was done in a good direction. After spending more days working we are coming with 6 metrics describing some dimensions of performance. Each of them was reviewed and agreed upon by a group of operators so everyone is on the same page, now they can compare results between each other. We calculate metrics for more than 200 machines in a matter of seconds, so users can expect fresh data any time they want with a single click.
 Now time is to ask users for feedback. Proud of our accomplishment, we are entering a room. We were not disappointed, operators are happy with results, but…

Where is the issue?

If we do some math, 6 metrics multiplied by 200 machines gives us more than 1200 values with every single calculation. Way too many values to monitor. Also, the operator needs to have some luck to be exactly at the time when a problem arises to act immediately. We have something to focus on, try to make the tool more useful.

Second iteration

After spending a few weeks on prototyping, user interviews, brainstorms we come up with a solution.
 When you are seeing a value it is just a number. Only insight you can derive from it, whether one is bigger or smaller than the other. But when you put that in a context you can tell if this number is good or bad. You can start comparing those values from different perspectives, like how it is situated over the trend of a sequence.
 Let’s put a level of importance or badness on every single value to achieve a heatmap.

Splash some colours!

After using colours it is easier to spot an item that the operator needs to take a look at first.
Speaking of looking at the issue, we would like to enable collaboration between operators and improve visibility and understanding of what happened across the factory. For every issue, there should be a corrective action.

Every issue should correspond to a corrective action

The result of the second iteration materialised in the form of alerting system and case management tool on top of it.

Here you can see an architecture that we came up with during the second iteration. Few things are worth to mention:

  • Ingestion engine is introduced to avoid higher load directly on the production database. Additionally, some data are mutable and keeping a history of value changes required us to re-create and store those changes on our side
  • Kafka is used to run calculations in parallel
  • Every single metric is located in time, so a time-series database was an optimal choice and therefore Influxdb was chosen
  • On top of that, we are able to detect anomalies like the delay between production steps
  • Interface to the end user depends on use case, it can be email, dashboards, tickets list


So what has changed for a customer after deploying our solution? 
 Line managers, equipped with our solution, starting a day by looking at the overall dashboard. It gives a general idea in what condition factory is. 
 In case of the dashboard being redder, line managers can concentrate on problematic line or machine. To get more context they can dive into a specific metric on a component and find out exactly when the problem started. Using a time-series visualisation, users can identify how other components contributed to the issue by putting additional metrics on a timeline.
 If an issue is more complex, broader, using a collaboration feature and thanks to unified metric logic, line managers can start a discussion on how to mitigate any potential loss.

Take away

So what we have learned from those iterations?

First, try to agree on well-defined, scoped tasks with end users. We came to an unknown environment, we were overwhelmed with the amount of domain information. Focus on one thing, do it well and then switch to a bigger or next topic. 
 Second, the results matter the most. We could think of a complex, scalable application, but in fact, we would not know if the end user would find it useful. Make changes to your architecture only when it is required and always keep an eye on business results.
 Lastly, show small, useful delta. Take small steps towards the end goal. With such an approach, the wrong step would hurt less, smaller steps are easier to adjust.
 After three months we started observing a snowball effect. Users realised the power of standardised, automated calculations and started asking to incorporate more metrics, started on-boarding more users. Finally, they had one stop shop for a unified view of factory performance instead of a pile of stale spreadsheets. Most importantly, with this one-stop shop, they managed to reach a production speedup of 15%.

Unit8 is AI & Big Data solutions provider located in Switzerland. We focus on helping companies solve some of their key problems using a mixture of data science and software engineering. If you would like to find more about us visit our website and follow our blog to find more interesting stories.

Special thanks to Marcin, Marek, Krzysiek and Weronika!