Doesn’t seem to have all the code, or at least the proportions are not accurate for the graph. For example, the most numerous file extensions on the surface of the first 9 million projects on github are:
- .js with 187 million files
- .png with 128 million files
- .php with 104 million files
We have these values because we keep a private archive of the public downloads from Github where it is possible to query the data. So it seems strange to see such mismatch.