Use Case: Design of Experiments (DoE) for mixtures and analysis of the collected data using Kodemetrics.

Francesca Giorgolo
Kode
Published in
6 min readJan 9, 2024

Introduction

Kodemetrics was born out of our laboratory experience, here in Kode. We realised that the workflow was quite structured and although the experiments concerned different areas, there were many commonalities.

In such a context, a tool that could speed up the various work steps would have been a significant time saver and would have been useful for our domain experts colleagues,, who often lacked programming skills and generally perceived them as an obstacle to their work.

Therefore, we have developed Kodemetrics, a tool designed for use in the chemical field but also applicable in other contexts. Its purpose is to assist users in conducting experiments and carring out statistical analyses without requiring knowledge of a specific programming language.

This Use Case aims to demonstrate the operational potential and to show the various applications of the software, highlighting the benefits that can be achieved in terms of time and ease of use.

As a matter of fact, we will here retrace the steps of an articulated laboratory experiment, such as the identification of meaningful and statistically analysable experimental trials for the study of the properties of new products (whether in the chemical, industrial, pharmacological, biotechnological or any other field).

Objective of the experimentation

As part of a chemical process study, a specific property of a mixture consisting of three different components is being investigated. The components are not subject to any particular constraints, other than the requirement that the sum of their quantities must equal 100% of the product.

In this context, an experimental design will be created to collect suitable data, in order to assess which of the three components has a greater effect on the property of interest.

Description of the Kodemetrics modules used

To conduct this type of analysis, we will be using Kodemetrics software, which allows us to generate the experiment and then analyse the results quickly and easily. The relevant sections for this use case are:

  1. DoE generation: dedicated to creating experimental designs, which are sets of highly informative tests used to statistically study the sources of variability of a phenomenon, determining the effect of one or more measured quantities on the phenomenon of interest.
  2. Data upload: this allows loading of the collected data and an initial visualisation in tabular form, providing high-level information such as the number of rows and columns that make up the dataset, allowing the handling of missing values and an assessment of the variability of the values in each column.
  3. Predictive Analysis -> Model for mixtures: dedicated to the analysis of data relating to mixtures, meaning data sets in which the predictors are bound by the constraint that the sum per row of the values contained in the columns relating to the components of the mixture is 100%. Here the user can specify the columns to be used and the pre-processing to be performed. Then the application will provide the model summary, the graph of actual vs. predicted values and two typical graphs for this type of analysis: the contour plot (which shows the trend of the variable under study within the experimental region) and the effect plot, which is useful for identifying any significant effects of the components on the response variable.

Description of the proceedings: DoE

We start by connecting to the Kodemetrics software link and logging in (or registering, via the sign-up page available in the left-hand panel under “Sign up”, if one doesn’t have an account yet).

Kodemetrics — login page
Kodemetrics — Login page

From the sidebar, it is enough to select the DoE Generation module from all the available modules (according to the subscribed plan): this will open the section dedicated to this purpose.

Kodemetrics — DoE Generation Module
Kodemetrics — DoE Generation Module

Using the drop-down menu, it is possible to select the type of design we want, among those available: in our case we will select ‘Mixtures’, meaning a design for mixtures.

Selecting the experimental design will reveal additional drop-down menus and checkboxes to customize the DoE based on the study’s requirements. In our case we specify three experimental factors and allow for the fitting of at least a quadratic model to the collected data.

The ‘Add internal points’ flag is also selected to investigate the response surface within the experimental region after recording the values of the variable under study.

Kodemetrics — DoE customization

By clicking on the ‘Create design’ button, two boxes appear:

  • The first box contains a table with the experimental trials, which can be downloaded in .csv format by pressing the ‘Download’ button.
  • The second box displays a graph, if the specified parameters allow it, to show the experimental region identified with the trials.
Box with the table of the experimental trials (left); Box with graph to show the experimental region identified with the trials (right)
Table of the experimental trials (left); Graph of the experimental region identified with the trials (right)

Now that the tests have been downloaded, we can carry out the experiments offline and record the values of the variable under consideration for each configuration. As an example, in the first test, we may use a mixture composed entirely of substance 1; in the second test, we may use a mixture composed of 50% substance 1 and 50% substance 2, and so on.

Description of the proceedings: Predictive Analysis

After completing data collection, we navigate back to the Kodemetrics software. Access the ‘Data Upload’ section and select the ‘Browse’ button to locate and upload the relevant experiment file with the data to analyse.

Kodemetrics — Experimental Data Upload
Kodemetrics — Experimental Data Upload

The dataset used in this case does not contain any missing values, therefore the default option of ‘Drop rows with missing’ will have no effect.

The box below contains two tabs:

  • Overview: this displays a table of the loaded data, its dimensions (rows/columns), and both a graphical and tabular representation of any missing values in the dataset
  • The Variables Pattern: this displays how the value of each variable changes for each observation, allowing for comparison of the columns to assess their scale and variability.
Table overview of the loaded data (left); Variables Pattern, that displays how the value of each variable changes for each observation (right).

To perform the actual analysis of the data, it is enough to click ‘Predictive Analysis’ in the sidebar, and then click ‘Model for Mixtures’.

Kodemetrics — Predictive Analysis
Kodemetrics — Predictive Analysis

The default model fitted is a linear Scheffé model, with all variables in the dataset acting as predictors. To model specific response variables, we just move them to the appropriate box and select from the drop-down menu the one to model.

The analysis produces a result which is divided into several sections, including:

  • the model summary,
  • a graph displaying actual and predicted values,
  • the effect size, following the Piepel (default option) or Cox direction,
  • the R2 value corrected for the constraint that the sum of all components must give 100% of the compound,
  • the contour plot, that can be used to display the trend of the variable under investigation within the experimental region, provided that the dimensionality of the region allows for it.
Kodemetrics — Linear Scheffé Model prediction

By selecting a different model type above, other specifications of the Scheffé model can be investigated. The following are the results for a quadratic and special cubic model.

Kodemetrics — Quadratic Scheffé Model prediction
Kodemetrics — Special Cubic Scheffé Model prediction

Conclusions

Kodemetrics enables users to create experiments and perform advanced statistical analyses quickly and easily, without the need for coding. The interface requires only necessary inputs, allowing users to focus on the experiment’s salient features and peculiarities. The results are returned promptly, enabling users to concentrate on their analysis.

In this case, it is evident that the quadratic model and the special cubic model fit the data better than the linear model. Following Occam’s Razor, we choose the quadratic model as it is simpler and performs equally well. Regarding the effects, we can conclude that substance number 3 (Cmp3) has the strongest effect on the response and is the only one with a positive effect.

--

--

Francesca Giorgolo
Kode
0 Followers
Writer for

Statistician and Data Scientist @Kode s.r.l.