Summer Research

During the summer we worked on scrapping the Chromium google code project for bugs. We were able to extract and store 337,373 bugs. This items contain multiple fields and interesting features that can be analyzed.

The bugs schema contains the following fields: bug_id, title, stars, status, reporter, opened, closed, modified, owner_email, owner_uri, content. Also each bug entry contains a set of text comments from the developers that worked on them.

This information is linked to the GIT Log commit database using the bug_id field. With these relations we can identify each file that has been involved in a bug fix.

More information on how the link between commit and vulnerabilities, coming in future posts.

  • F