Predictive Theories and the Primrose Path of Scientism, Part 1: Variable Theories [Rough Draft]
The are two main kinds of predictive theories: variable and process. Social science researchers, in their quest to appear scientific, over-emphasize variable theories resulting in research that is largely irrelevant to the real world.
There are two kinds of theories: (1) variable and (2) process. In a variable theory you have a set of measurable factors that predict some outcome. Barometer pressure predicting rain is an example of a variable theory. In a process theory you have a set of inputs, and a procedure for transforming these inputs into some outcome. Darwin’s evolution is a classic example of a process theory (and yes, there are also those X-Studies theories, but we’ll ignore those because we’re discussing scientific theories).
Sciences of the Artificial, such as computer science or management science, need a mix of variable and process theories. Unfortunately, in many social science fields like management, there is a bias towards variable theories since they appear more scientific. Thus, management researchers publish tons of papers that look scientific, but which have practically no value to actual managers, who depend largely on procedures in their day-to-day activities.
This kind of research for the sake of looking scientific is derogatorily termed: scientism, and in this multi-part essay, I make an argument for more process theories in social science.
Before I do so, let’s give variable theories a fair treatment.
1. VARIABLE THEORIES
Variable Theories claim that some measurable outcome Y can be predicted by one or more measurable X’s.
For example, take the claim: “Good grades (Y) are determined by how much effort (X) students put into studying”. This claim is technically a theory, but it’s unproven so it’s more properly called a hypothesis.
The problem with this claim is that neither “good grades” nor “effort” are numbers so you can’t run statistics to prove it — the statement needs to be operationalized. One way to operationalize it is to use exam scores to measure good grades, and hours studied to measure effort.
Once operationalized, the general procedure for proving a variable theory is:
1. Collect data
2. Run a statistical operation (examples include: regressions, analysis of variance, and path analysis)
3. See if the results are significant
4. Declare your hypothesis is a theory, if the results are significant.
Let’s see how an actual variable theory is proven.
2. AN EXAMPLE OF PROVING A VARIABLE THEORY
The most recent presidential election — where Donald Trump beat Hillary Clinton in terms of electoral college votes but lost the popular vote — had many people wondering just how electoral votes were assigned to states.
Suppose you read on the internet that the number of electoral votes a state received, was based on its population. In other words: electoral votes (Y) were determined by population (X).
You go on Twitter or Facebook, and post this “fact”. Of course, being social media, someone challenges you to prove it. Here’s how you do it.
Step 1. Collect the Data
Electoral votes and population are already measurable, you just have to find the data, and hope it’s reliable. Fortunately, Census.gov has the population data and Archive.gov has the number of electoral votes:
Step 2. Run a Statistical Operation
If you chart population versus electoral votes, you get the following figure, which suggests using a regression as the statistical operation. A regression finds the best line or curve that fits the data.
The chart below show the results of the regression. In this case a line best fit the data. This particular regression was done in Microsoft Excel using Data > Data Analysis > Regression, on the table above.
Step 3. See if the Results are Significant
Statistical operations often include a significance measure. For regressions, this measure is R-squared, which ranges from 0 (not significant) to 1 (significant). In the chart above R-squared is .9991 and since it is close to 1 it is significant.
Step 4. Declare your Hypothesis is a Theory, if Significant
Since R-squared is .9991, and this is a significant value, you can proudly post on social media that you’ve proven your theory:
“population determines number of electoral votes.”
And no one can argue with you, because your theories are backed up with reliable data and appropriate statistics.
An Aside on Predictive Power
What’s even better for you is that your theory is predictive. Let me explain.
When you run a regression, you also get the values you need to reconstruct the equation for the curve. For a line, this is an equation of the form:
y=mx+b
If you remember your high school algebra, m is the slope and b is the y-intercept.
You can partly see this equation in the chart above: y=1E-06x+1.9602. I say partly because 1E-06 is really 1.41874E-06. The Excel regression gives the exact values in a table:
Plugging these values in, you get the equation:
ELECTORAL VOTES = 1.41871/1,000,000 * POPULATION+ 1.96
In plain English, take a state’s population, divide by 1 million, multiply by 1.42, add 1.96, and round up.
CHECK
California’s Population: 37,253,956
Divide by 1,000,000 = 37.253956
Multiply by 1.42=52.90
Add 1.96=54.86
Round up: 55 ← CORRECTWyoming’s Population: 563,626
Divide by 1,000,000: .563626
Multiply by 1.42=0.80
Add 1.96=2.76
Round up: 3 ← CORRECT
So the regression yields an equation with the correct value, but what does this equation actually mean? According to Archive.gov:
Electoral votes are allocated among the states based on the Census. Every state is allocated a number of votes equal to the number of senators and representatives in its U.S. Congressional delegation — two votes for its senators in the U.S. Senate plus a number of votes equal to the number of its members in the U. S. House of Representatives.
The 1.42 times the population in millions denotes the number of representatives a state has. The 1.92 denotes the 2 senators. It’s quite astonishing that a simple statistical operation was able to discover precisely what that statement means.
So, as you can see, variable theories are quite useful for certain problems.
In the next essay we’ll look at process theories, then discuss why they are more relevant for management and other sciences of the artificial.
Bibliography
Variable vs Process Theories
- Mohr, L. B. (1982). Explaining organizational behavior. CA: Jossey-Bass.
Sciences of the Artificial
- Simon, H. A. (1996). The sciences of the artificial. MA: MIT press.
Regressions
- Draper, N. R., & Smith, H. (2014). Applied regression analysis. NJ: John Wiley & Sons.