Yes, similar type of samples. Forks are not indexed. In addition we don’t index alternative branches, just the master branch. To mention, we only download the HEAD version of the repository. Information about stars, revisions or other details are not indexed. We just care about storing binary files and knowing where they originated from.
It won’t be easy to look into BigQuery. We are a startup and everyone in our team is busy fixing other things or working on sales.
What I could do is setup our crawler to go through the data and have them output a text file (CSV, JSON, etc) that you import to BigQuery. Just provide me with a template of how the data should look like for a few sample projects and I can get it done.
My email address is email@example.com