Variables — What are they?

Riteshpratap A. Singh
Analytics Vidhya
Published in
7 min readJun 6, 2020
Photo by Paolo Nicolello on Unsplash

Variable is a quantity that may vary from object to object. For example, we measure heights of 50 mango trees in a selected plot and arrange the results in a table. Here, the quantity that vary between objects (trees) is its heights. Height, therefore, is the only variable in this example. The table containing collection of values of our variable is called ‘dataset’ or sample.

Independent vs dependent variables

Let us consider an example. Algal net primary productivity (mass of carbon per unit area per year (g C (m^-2) (yr^-1)) is measured under various temperatures and light intensity settings. In this experiment, there are three variables involved; primary productivity, temperature and light intensity. However, out of three, only one variable (primary productivity) is measured; other intensity of the two variables are controlled in the experimental set-up. Variables whose variation does not dependant on other variables is called independent variables. In this example, both temperature and light intensity are independent variables, as the variation in the values of these variables are not dependant on other variables. Neither temperature nor light intensity are dependent on primary productivity. However, primary productivity is dependent on both temperature and light intensity. Primary productivity in this example is a dependant variable- a variable whose values dependent upon other variables.

To test whether a variable is independent or dependent, a useful tactic is to substitute the suspected variables in this sentence to see whether the statement makes sense or not: (Independent variable) causes a change in (Dependent Variable) and it is not possible that (Dependent Variable) could cause a change in (Independent Variable). For example, let us consider two variables ‘time spent studying’ and ‘test scores’. (Time Spent Studying) causes a change in (Test Score) and it isn’t possible that (Test Score) could cause a change in (Time Spent Studying). We see that “Time Spent Studying” must be the independent variable and “Test Score” must be the dependent variable because the sentence does not make sense the other way around. Note in case ‘time’ (or related concepts such as ‘age’ etc.) is taken as a variable in the experiment, it would always be an independent variable. A more formal procedure to test whether either of the two variables are dependent on the other, test of correlation (also called covariation) can be adopted. For quantitative data, Pearson’s Correlation test can be used, while for categorical data Pearson’s χ2 test of independence can be used. However, correlation does not reveal which is dependent variable and which is independent variable. Beware of the problem of statistical confounding discussed in module 2. Correlation will be discussed in length in a later module.

In scientific experiments, only dependant variables are measured oftentimes. Dependent variables are therefore known as outcome variables, as it determines the outcome of experiments. Values of these outcome variables are in turn dependent on (and determined by) independent variables. As the experimenter as part of the experimental design oftentimes controls values of independent variables, these variables are also known as treatment variables or response variables. A factor is an independent treatment variable whose settings (values) are controlled and varied by the experimenter. The intensity setting of a factor is the level. Levels may be quantitative numbers or, in many cases, simply “present” or “not present” (“0” or “1”). For example, to find the effect of temperature on resistors, resistance was measured before and after placing the resistors in 3 ovens set at different temperatures. Here, dependent outcome variable is resistance and independent treatment variable is temperature. Different temperature settings are “levels”, here 3. In another example, to find effect of temperature and heating-time on resistors, resistance was measured before and after placing it in 3 ovens set at three different temperatures for 3 different periods. In this case, both “Temperature” and “Time” are factors.

Qualitative vs. Quantitative Variables

Variables can also be grouped based on whether the variables can be expressed in numbers or not. Variables that cannot be expressed in numbers (for example, level of happiness, beauty, ethics, love etc.) are called qualitative variables. Some qualitative variables can be grouped into different labels or categories (for example, gender, nationality etc.). Such variables are known as categorical variables or attribute variables. Variables that can be expressed in numbers, such as height, weight, molarity, photon flux density etc. are called quantitative variables.

Discrete Vs. Continuous Variables

A discrete or categorical variable is a variable that can only take countable (either finite or countably infinite) number of values. For example, number of children given birth to, number of atoms in a bar of soap, number of cycles passing through a traffic signal etc. Value is presented as such; no ‘rounding-off’ is involved with discrete variables. In other words, discrete variables can be counted. However, values of discrete variable need not have to be integers. For example, average number of girls in a class could be 17.5, probability of rolling an even number on a dice is 0.5 or cost of Lunch is INR 45.50; all of these values have fractions. Other examples of discrete variables are:

  • The number of phone calls arriving at a call center per minute.
  • The number of goals in sports involving two competing teams.
  • The number of deaths per year in a given age group.
  • The number of jumps in a stock price in a given time interval.
  • Under an assumption of homogeneity, the number of times a web server is accessed per minute.
  • The number of cells in 100 microliter of a cell suspension
  • The number of mutations in a given stretch of DNA after a certain amount of radiation.

On the other hand, continuous variables can take on (effectively) uncountably infinite many values over their range. Examples are height or weight. Height is frequently reported only to the nearest whole centimetre. When a person is reported as being 178 cm tall, that person could, for example, have a height of 177.514512312…. cm, depends upon precision of measurement method. While a routine ruler could measure length to the nearest millimetre (for example, 3.4 cm), a Vernier calliper could increase the precision further (3.4412 cm).

Precision vs. Accuracy

Precision is total number of decimal places in a measurement. For example, in our previous example, measurement with a ruler revealed a value of 3.4 cm, which has only one decimal places, while measurement with a Vernier callipers revealed a value of 3.4412 cm which has four decimal places. One could say that the ruler had a precision of 1, while the Vernier callipers had a precision of 4 (therefore, higher precision than ruler). However, note that a high precision does not mean the measurement is accurate. Accuracy is the absolute difference between measurement and “real value”. Real value is the true value of that variable. For example, a standard silver coin that has a certified weight of 1 g measures 1.9975 g in a high-precision jewellery-grade weighing machine. The precision is 4 (as there are four decimal places in the measurement). However, the measured value is highly inaccurate. Accuracy of this measurement is 1.9975–1.0000 = 0.9975. A value of zero is the highest possible accuracy level; deviations from zero would indicate inaccuracy in the measurement.

Limits and Ranges

When a group of numbers is reported, nominal limits of the group are lowest and highest reported values (in other words, minimum and maximum respectively). For example, three persons weigh 43 kg, 55 kg and 67 kg. The lower and upper nominal limits are 43 kg and 67 kg, respectively. On the other hand, lower real limit is the lowest value that the lower nominal limit could have been rounded up from. For example, our lower nominal limit of 43 kg could have been rounded up from 42.5 kg. Conversely, upper real limit is the highest value (non-inclusive) that the upper nominal limit could have been rounded down from. For example, our higher nominal limit of 67 kg could have been rounded down from 67.5 kg (note: this number is non-inclusive). These real limits depends on the precision of measurement. Difference between nominal limits are known as exclusive range. In our example, exclusive range = 67–43 = 24 kg. Difference between real limits are known as inclusive range. In our example, inclusive range = =67.5–42.5 =25. If rounding to the nearest integer, inclusive range is exclusive range + 1.

Summary

  • Variable is a quantity that may vary from object to object. Collection of values of a variable is called a sample, or dataset
  • Variables whose variation does not dependant on other variables is called independent (treatment) variables, also called factors. — Variables whose values dependent upon other variables are called dependent (outcome) variables.
  • In scientific experiments, only dependant variables are measured. Levels of independent variables are set as part of scientific experimental design.
  • To find association between two variables, correlation tests can be employed. However, correlation does not reveal which is dependent variable and which is independent variable.
  • Variables that cannot be expressed in numbers are called qualitative variables, while those that can be expressed numerically are called quantitative variables.
  • A discrete or categorical variable is a variable that can only take on a finite number of values. Continuous variables can take on (effectively) uncountably infinite many values over their range. Continuous variables are often rounded to the nearest integer and its measurement depends on both precision and accuracy.
  • Precision is total number of decimal places in a measurement, while accuracy is the absolute difference between measurement and “real value”.
  • While nominal limits are synonymous to minimum and maximum, real limits are either the lowest value that the lower nominal limit could have been rounded up from (real lower limit), or the highest non-inclusive value that the upper nominal limit could have been rounded down from (real lower limit). Difference between real limits are known as inclusive range, while that of nominal limits are known as exclusive range.

--

--

Riteshpratap A. Singh
Analytics Vidhya

AI Researcher | Data Scientist | Computer Vision Engineer | Subject Matter Expert - AI/ML/ DL |Bioinformatician | Geneticist