A wrong example of the Prediction recipe
Republican or Democrat?
Let’s suppose that you want to explore an election prediction model for the next US election, using economic and political indicators instead of polling data.
You upload on Alpha+Omega an excel or csv file with your data.
For this example we used data from the following sources.
- GDP per capita in election year from US Bureau of Economic Analysis & US Census Bureau
- Historical presidential approval ratings (highest and lowest for each president) from Wikipedia
First of all, be sure that in your file, there is column with the name related to what you are looking for. If you are searching the ‘winning party’ for elections on 2020, you must upload a file with a column with the name “winning party” (and related data). In order for the prediction to work, you need to search for the result in one cell on the ‘winning party’ column.
Notice: We want to predict the election result for the year 2020. In the file you upload, there must be no cell / data for the year 2020 because this is actually what we ask the algorithm to do: to check older year data in order to predict year 2020.
The rest of the data in the row, you need to find them or estimate them. For example what will be the GDP per capita in the year 2020.
Now we have a prediction model and we are ready to predict and as you can see, there are some tabs that need to be filled. At ‘year’ we will put 2020, but we also need dgp per capita, highest and lowest approval. In case we do not have the concrete values we need to put an estimation.
As soon as we have what we need in order to predict the winning party for the year 2020, we press “Predict “winning party”.
And the winner is…
You got your result, but..
In order to have a reliable prediction model, you need to have an excel / csv with a lot of data, that is to say, at least 100 rows with information. Why? Imagine the algorithm of the prediction recipe as a cook:
When you give good ingredients (data) to the cook (the algorithm of the prediction recipe of A+Ω), in order to make a delicious food (have a good prediction result), a prerequisite is that the cook has cooked many times in his life (these are the rows in your file, that is, each row is a food once made and how tasty it was). Each dish needs some ingredients and in the end it will be up to someone who tastes the food, to say how delicious it was. Now, if you give to two cooks (two prediction recipes) and the first cook has cooked 100 foods in his life and the second only ten foods in his life, then imagine, who is more likely to prepare the best food?
A+Ω recipe, can even predict with less data is just that it is not safe. The journalist will need to evaluate the result.
In the case of the above example of the election of the American president, ideally, we need data for 100 elections (not 100 Presidents), GDP and lowest / highest approval for the last 100 elections (not just 100 years).
If we had the above data, the prediction result would have been safer. Moreover, modeling of the data is very important. The structure of the columns need to have data that are interconnected and have and related to what we are looking to predict (Alpha+Omega team can help you over this task).
A journalist running the prediction recipe, need to understand that if she/he wants a reliable prediction, needs to have as many data as possible. But again, even if the journalist finds the data and runs the recipe again, it is not necessary that the recipe will work better. Strange? No! It is exactly the same thing as if a journalist has dedicated a lot of time to do an investigation and has NOT found the desired evidence / results. Then, needs to decide whether to look further for more sources or to stop. If he/she decides to search more evidence, he is no longer sure, he will find something. But if he does not search further, he will certainly not find anything.
It is exactly the same thing with the prediction recipe!
Note: This article was modified on Dec. 21, 2020