Q#100: Allocating votes from pooled elections
Note: I believe it should be free and readily available for everyone to gain value from data, hence I pledge to keep this series free regardless of how large it grows.
Suppose you’re running a school election, and are looking to tally up the results. The election is structured as follows:
- Groups of students are split into voting pools, each with unique IDs. The voting pools are assigned a number of votes to allocate based on how many students are in the given pool
- A voting pool can vote multiple times, but if they do their votes will be split evenly amongst the candidates
The voting information is stored in two tables (shown below): ‘election_data’ and ‘voter_pools’:
- election_data contains each voting pool’s votes for the given candidates. You’ll notice many pools voted for multiple candidates
- voter_pools contains the number of votes allocated to each unique pool
Given this information, write a SQL query to return the winner of the election, along with their associated votes.
Table: election_data
Table: voter_pools
TRY IT YOURSELF
ANSWER
Running a school election involves processing voting data efficiently to determine the winner. And as Data Scientist we are called to pull data and summarize it using SQL.
SQL Query:
WITH CandidateVotes AS (
SELECT
ed.candidate,
vp.voting_pool_id,
vp.num_votes / COUNT(DISTINCT ed.candidate) AS votes_per_candidate
FROM
election_data ed
JOIN voter_pools vp ON ed.voting_pool_id = vp.voting_pool_id
GROUP BY
ed.candidate, vp.voting_pool_id, vp.num_votes
)
SELECT
cv.voting_pool_id,
cv.candidate,
SUM(cv.votes_per_candidate) AS total_votes
FROM
CandidateVotes cv
GROUP BY
cv.voting_pool_id, cv.candidate
ORDER BY
cv.voting_pool_id, total_votes DESC
LIMIT 1;
Explanation:
- We create a Common Table Expression (CTE) named 'CandidateVotes' to calculate the number of votes per candidate per voting pool. The votes are evenly distributed among the candidates in each pool.
- The main query selects the voting pool, candidate, and the total votes for each candidate by summing up the votes_per_candidate within each pool.
- The results are grouped by the voting pool and candidate, and the rows are ordered by the voting pool ID and total votes in descending order.
- Finally, we use the 'LIMIT 1' clause to retrieve only the top row, which represents the winner with the highest total votes.
Plug: Checkout all my digital products on Gumroad here. Please purchase ONLY if you have the means to do so. Use code: MEDSUB to get a 10% discount!