Reference Towns in Turkish Elections
Which towns’ results give clues about overall results…

After each election, we face with different maps and numbers on TVs. Agencies and institutions deliberately share the results of some cities or towns to shape public opinion and social tension.
Despite the high frequency of elections and increasing data analysis & visualization technologies, no TV channel share projections depending on the previous elections’ results and regional populations.
The most straightforward approach for such a projection is multiplying town populations with early results of counted ballot boxes in these towns. Even this primary method is not applied to the so-called “Election Displays” of TV channels.
In this study, I would like to elaborate on a more sophisticated approach that is finding reference towns that point out the overall results. If it is possible to find such towns, it would be great to create projections for overall results at the early hours of counting.
The performance metric of the study is the absolute error between parties’ predicted and actual vote ratios. As Turkish voters witnessed elections resulted in less than 1% ratio difference between candidates, I want to achieve an error of less than 1%.
The study consist of 4 parts:
- Collecting & Cleaning Data
- Descriptive Analysis of Data
- Clustering Town Results & Country Results to find similarities
- Checking Out Regression Possibilities
All the codes are shared on my GitHub repository:
Data Collection & Cleaning
First, I would like to thank people that shared the results of previous elections:
https://github.com/eoner/YSKSecimSonuclari2015
https://github.com/mertnuhoglu/secim_verileri
https://github.com/cilekagaci/secimverisi
https://github.com/berkorbay/secimler
https://github.com/oztalha/YSK
https://github.com/arikan/adilsecim-verileri
https://github.com/mkozturk/secim2018
After collecting the data on these repositories, I had a complicated cleaning process, as you can see on the screen print. I wish the Supreme Election Council had shared all the data in a combined and clean format.

I’ve reformatted three files (yerel2014.csv, genel25_26_referandum2017_27meclis_cb2018.csv, 31mart2019yerelsecimilcesonuclari11.csv) to get data frames with columns
- “Town” as the CITY_TOWN tuple
- “Party-1”, “Party-2”, “Party-3”… as the ratio of overall
For 2019 results, I have summed AKP & MHP columns as CUMHUR and CHP & IYI columns as MILLET, to represent alliances.
Descriptive Analysis
Histograms for the parties’ vote ratios in this dataset are shown below.

These histograms show that AKP ratios change between 0% and 90% with a normal distribution till 2018. Afterward, it becomes left-skewed. In the last election, alliance with MHP leads to ratios close to 100% in many towns.
CHP and MHP have almost 0% vote ratios in a significant number of towns, in all elections. Only in 2018, MHP achieves to have a proportionally lower number of zero-ratio towns, probably with the help of the alliance with AKP.
Clustering Results
After combining the results of 2014, June 2015, November 2015, 2017 (referendum), 2018, and 2019 elections, I have determined the proximity of town results with other towns and overall results by applying Agglomerative Clustering. The results are shown on the dendrogram below.

As expected, many neighbor towns’ results are similar to each other. On the other hand, it is worth mentioning that towns of Istanbul are widely spread on this dendrogram, pointing out the highly adverse socioeconomic structure of these towns.
When we have a closer look, Körfez, Kestel, Derince, Gölbaşı (Adıyaman), Ümraniye, Pendik, Kağıthane, and Gaziosmanpaşa residents appear to resemble Turkish voters at most. However, considering the percental deviations on the table below, it can be said that no single town’s result is resembling overall results.

Another approach might be clustering town results based on single parties’ ratios. That’s why I’ve created dendrograms for parties.




As seen on tables above, AKP ratios in Ahmetli, Usak, and Yenisehir somewhat resemble overall AKP ratios. However, deviations of CHP ratios are much larger, since CHP had 22%-30% vote ratios in the analyzed elections.
Considering clustering results, we can conclude that there is no reference town for Turkish elections. However, there might be some towns whose results change in line with overall results. That possibility forces us to check regression models.
Regression
Sets, Model & Metrics
Since MHP has changed its opposition state throughout these elections and IYI is recently founded, I have only put AKP, CHP, and HDP results in regression. I have removed referendum results from the train set as it doesn’t match directly with parties.
Since we have more input features than data samples, I have chosen the Least Angle Regression as the regression model*. Since I have the results of 6 elections, I’ve used 2019 elections the testing set while keeping other five elections in the training set:
- Training Set: 2014, 2015–1, 2015–2, 2018-Party, 2018-President Elections
- Test Set: 2019 Elections
Since we have only one element in the test set, averaging metrics such as Mean Absolute Error (MAE), Mean-Squared Error (MSE), and R2 Score are not needed for calculating test performance. Similarly, the model will fit well to the training set as there exist only six observations. Therefore absolute error will be checked to measure the performance of the model.
PCA
First, I have a fit with all 955 towns. However, the basic idea of this study was to find a few reference towns for overall results. That’s why I’ve used PCA to reduce dimensions to see whether a small set of linear combinations of town results can project overall results. Four components explain 100% of the variance in AKP, CHP, and HDP datasets and predictions saturate at four components, as shown in figures below.

Results
By keeping four components with PCA, predictions for three parties are gathered with %3 absolute error in AKP ratio and %4 absolute error in CHP ratio:

As these errors are significantly high, it is shown that the selection of a few towns for regressing overall results is not realistic.
Conclusion
I’ve started this study to find reference ballot boxes for town results. It might be a more interesting and comprehensive study. However, ballot box results for all elections in the 2010s are not on the Internet. That’s why I’ve tried to find reference towns for overall results.
It was hard to tackle the changes in attending parties, their alliances, and political side changes. It would be much easier to analyze US Elections, which mostly witness the competition of the same two political parties.
It was nice to visualize similarities among towns with dendrograms. No direct similarity was found between town results and country results. Moreover, it is shown that it is not possible to project the overall results by only considering a few towns’ ratios.
I wish this study will arouse interest in the use of better data analysis & visualization methods in Turkish elections, and more transparent & dynamic post-election programs are televised.