Naming conventions in Python import statements. A BigQuery adventure using the Github DB Dump.

In June Felipe Hoffa published a post describing the partnership between google and GitHub, that allows you to query all of the public code posted on GitHub

Reading the post, i have seen that there was no post about Python.

So i did a Query, and after 6 seconds and 32GB processed later, i got a result showing a count list of all of the import statements.

In total, there are over 300.000 different import statements, the most used being import os and import sys.

What i was more interested to see is how people rename this modules in python.

We can see the example of import numpy as np

206K projects use this naming convention.

There are around 2000 projects that use other naming conventions, like import numpy as N

If we dive deeper when users import just part of the numpy module, we see that there is no naming convention that developers use

The same thing can be observed for the pylab module.

And this becomes evident when you look at how PyLab advertise the module on the site.

from pylab import *

If we look at another Python module, Pandas, we see that they provide a consistent naming schema in the documentation.

import pandas as pd

And this is reflected in the data, almost everybody used this conventionm except 50 projects that use pa, 42 projects that uses pand and 40 projects that use pds.

Another thing that we can see is that the name sp means

In 3290 projects sp refers to the scipy module.

In 1998 projects sp refers to the scipy.sparse.

In 1272 projects sp refers to the subprocess module.

In 292 projects sp refers to the sumpy module.

I was not able to create a table, so i`m sharing the top 10000 requests using this google spreadsheet link

The code used is this.

SELECT line as n
SELECT SPLIT(content, ‘\n’) as line
FROM [fh-bigquery:github_extracts.contents_py]
HAVING LEFT(line, 7) = ‘import ‘

You can do much more, we can see the def naming, the Classes, etc

I encourage you to play with the data.

You can find me online on Medium Florin Badita, AngelList, Twitter , Linkedin, Openstreetmap, Github, Quora, Facebook

Sometimes i write on my blog