Topics in Statistics/Data Science

What is Design of Experiments?

In Multivariate Analysis

Bahuguna
Geek Culture

--

https://images.app.goo.gl/WRmBfzGHAsBTN223A

Design of Experiments (DoE) is a method used to analyze the relationship between a particular independent variable and the dependent variable when there are multiple independent variables affecting the dependent variable. I will try to explain this definition and the concept through multiple examples.

Suppose you own a piece of land and you grew a vegetable on this land. Let us call this vegetable, VegA. You are not satisfied with the total amount of VegA you got in this batch and you want to increase the output. How will you do it?

You decide to take help from me. I listen to your problem and decide to use DoE to solve the problem. First of all, I will note down all the factors which contribute to the level of output of VegA. I find that soil fertility, weather, availability of water and pests affect the total yield (There may be more factors. We are taking only four factors for understanding purpose). Here, these four factors are independent variables and total yield is a dependent variable (as total yield depends upon these factors).

Now I want to study the relationship between total yield and availability of water i.e., how does the amount of water given to the crop affects the total yield. To know this, I will create three boxes in my land and plant the same vegetable seeds in each of them. I will just vary the amount of water in these boxes and keep the rest of the things (amount of fertilizers, amount of pesticides) the same in each box. In the first box, I will supply the same amount of water I supplied in the previous batch. In the second box, I will supply more water and use less water in the third box.

Next, I will compare the yields from these boxes and I will be able to tell whether the availability of water and total yield have a positive correlation (yield increases with an increase in the water supply/yield decreases with a decrease in the water supply), negative correlation (yield increases with a decrease in the water supply/yield decreases with an increase in the water supply) or no correlation (total yield does not depend upon the amount of water used). In this way, I will know the exact relationship between these variables and use this information to my benefit.

Now I will create three more boxes and repeat the same process to determine the correlation (type of relationship) between soil fertility and total yield. This time I will vary the amount of fertilizer instead of water and collect the required information. Similarly, I will check the effect of pesticides on total yield.

To check the effect of weather on total yield, I will grow the vegetable in different seasons to determine whether the weather affects the total yield or not. If yes, I will know which season is best to grow VegA.

After all this is done, I will put together all the information and take the optimum value of each independent variable to get the maximum value of the dependent variable i.e., total yield.

Here I just designed appropriate experiments to study the effects of multiple independent variables on one dependent variable. This whole process is called “Design of Experiments”.

It is not necessary that only one-to-one relationships can be studied through DoE. We can also design experiments to study the effect of a combination of two or more independent variables on a dependent variable. In this example itself, we can study the combined effect of soil fertility and the availability of water on total yield.

To do this, I will keep the other two factors constant but use a unique combination of the amount of fertilizer and water in each box. For example, in the first box, I will put more water and less fertilizer. In the second box, I will put less water and more fertilizer. Like this, I will create nine boxes each having a unique formula. Now, I will be able to determine which combination is most suitable to increase the total yield of VegA.

Hence, DoE can be used to study every possible relationship that exists within dependent and independent variables. The example given above just creates a very basic understanding of the methodology. Usually, we design more detailed experiments so that the data we collect from these experiments is sufficient and appropriate for further statistical analysis. Let me give you another example in which we design a more detailed experiment.

Suppose you have created a homemade pill and you claim that this pill is effective in treating anxiety. How do you prove it firmly? Again, you come to me for help. Again, I decide to use DoE to solve the problem. First of all, I will collect a group of people who are having anxiety issues. Let us suppose I collect 60 people. Now I will divide them into three groups of 20 each. These groups are GroupA, GroupB, GroupC.

I will give the pills made by you to GroupA. I will give another set of pills to GroupB. But these pills are just sugar candies made to look like medicinal pills. They do not have any effect on humans. I will not give any pills to GroupC. Over a period of time (the amount of time you think it takes for pills to start showing results), GroupA and GroupB will continue to take the pills given to them (in the dosage recommended by you). GroupC will not take any kind of pill during this time.

After the time period is over, I will collect the data from everyone on whether their anxiety is cured or not. In the end, I will have sufficient data to perform further statistical analysis. Through this analysis, I will be able to determine whether the pills made by you are actually helpful or not, or it is the placebo effect that is curing anxiety. It can also be that anxiety goes itself after a certain amount of time and nothing external is required to solve the problem.

https://images.app.goo.gl/v4Kh8ejLGS79MH1x8

Hence, I designed an appropriate experiment which helped me in gathering enough data to test my hypothesis. This is what DoE is all about. To give an analogy “ DoE is to Statistics as Experimental Physics is to Physics”.

Through this article, I explained what does the term “Design of Experiment” means in layman's language. Obviously, there are a lot of factors involved in designing experiments that we have not discussed. In future, I will try to bring these factors to light. For now, I will end this article with,

Developmental scientists like me explore the basic science of learning by designing controlled experiments. — Alison Gopnik

--

--