Image for post
Image for post

The top GitHub projects per country

Let’s find the most popular GitHub projects for each country. A simple query can give us that result: Take every star given on GitHub and look for the country of origin of users who have chosen to show their country on their GitHub profile. But we are going to go deeper than that.

Felipe Hoffa
Apr 5, 2017 · 9 min read

Step 1: The most popular GitHub project per country — naive count

Image for post
Image for post
The most starred project per country on GitHub during 2016 — naive count.

Step 2: What about the 2nd most popular project for each country?

Image for post
Image for post
The 2nd most starred project per country on GitHub during 2016 — naive count.

Let’s skip the top 75 worldwide GitHub projects, and only then choose the most popular project per country:

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
The most starred project per country on GitHub during 2016 — taking out the top 75 GitHub worldwide projects.

A list of the most interesting GitHub projects for each country:

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Queries

Datasets

Most popular GitHub project per country, naive count:

#standardSQL
SELECT REGEXP_EXTRACT(a.name, r'/(.*)'), stars, iso3
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn=1
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars DESC

Most popular GitHub projects per country, removing the top 75, special rank:

#standardSQL
SELECT REGEXP_EXTRACT(a.name, r'/(.*)'), stars, iso3
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND repo.id NOT IN (SELECT id FROM (
SELECT repo.id, COUNT(*) c
FROM `githubarchive.year.2016` a
WHERE a.type='WatchEvent'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 75
))
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn=1
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars DESC

Most popular GitHub projects per country, removing the top 75:

#standardSQL
SELECT country, a.name, stars, description, stars_country, other_countries_sc
FROM (
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn2
FROM (
SELECT *, ROUND(SUM(1/(1+ROUND(rn/20,0))) OVER(PARTITION BY repo_id),1) other_countries_sc
FROM (
SELECT * FROM (
SELECT country_code, name, stars, ROW_NUMBER() OVER(PARTITION BY country_code ORDER BY stars DESC) rn, SUM(stars) OVER(PARTITION BY country_code) stars_country, repo_id
FROM (
SELECT country_code, repo.id repo_id, COUNT(DISTINCT actor.login) stars, APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value name
FROM `githubarchive.year.2016` a
JOIN `ghtorrent-bq.ght_2017_01_19.users` b
ON a.actor.login=b.login
WHERE country_code IS NOT null
AND repo.id NOT IN (SELECT id FROM (
SELECT repo.id, COUNT(*) c
FROM `githubarchive.year.2016` a
WHERE a.type='WatchEvent'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 75
))
AND a.type='WatchEvent'
GROUP BY 1, 2
HAVING stars>5
)
)
WHERE rn<300
)
)
WHERE other_countries_sc<3
)
WHERE rn2<11
) a
LEFT JOIN (SELECT url, ANY_VALUE(description) description FROM `ghtorrent-bq.ght_2017_01_19.projects` GROUP BY 1) b
ON a.name=REPLACE(url, 'https://api.github.com/repos/', '')
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
ORDER BY stars_country DESC, country_code, stars DESC

Want more?

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store