Google Summer of Code with “Open Chemistry”

Computational Chemistry Web repository : My GSoC 2017 project

Nitish Garg
3 min readAug 26, 2017

I was fortunate to work with Open Chemistry during Google Summer of Code 2017. It was a great learning experience for me and has motivated me to contribute to open-source further ahead.

Here, I summarize my project, work that was completed during the GSoC period and tasks which are to be taken up in the future.

Project deliverable

Abstract on GSoC 2017 Project list page

This was a completely new project aiming to develop a framework which can be used by chemistry groups to setup their own data repository of computational chemistry format log files and deploy a (public) server providing REST API and web interface to browse and view the documents in the repository, search/filter using parsed attributes available, add new files to the database, download a document’s data and instantly parse a log file with cclib just by uploading the file in browser.

The project makes use of cclib, 3Dmol.js, openbabel and is a flask-mongodb based framework.

Links

Viewing my contribution

I started this project from scratch and my contribution is covered by the commits between Mar 11, 2017 to Aug 19, 2017 on the commits page of project repository.

What work was done

These are the main functionalities completed during the GSoC period:

  • Parse files in a directory with cclib, determine few more useful attributes and store this parsed data in MongoDB database and generate SVGs
  • REST APIs to:
    * Fetch molecular formulas
    * Fetch documents corresponding to a molecular formula
    * Get details of a particular document
    * Parse a log file and return results
    * Add a log file to data repository
  • Similarly, web front-end consisting of:
    * Page to instantly view parsed results by uploading a file
    * Page to add log file to data repository
    * Browsing of repository by molecular formula
    * Search page supporting filtering with various attributes of logfiles
    * Displaying details of a document selected from browsing menu or search results
    * Option to download document/file data
    * 3D rendering of molecule using XYZ data
  • Dockerization to allow setting up an instance easily

Future work

At the time of writing (Aug 2017), the following work is planned to develop the project further:

  • Determining IUPAC and common name using InChI/other info
  • Finding and storing data/properties about molecule using PubChemPy
  • Providing support for searching by molecule’s InChI/InChIKey
  • Providing support for searching by molecule’s name (IUPAC and common)
  • Adding authentication for APIs and users
  • Testing scalability of database
  • Writing tests for APIs and database setup script
  • Handling invalid files and reporting unsupported format
  • Developing a regex-like molecular formula matcher

Acknowledgements

I am very thankful to my mentor, Prof. Geoffrey Hutchison who guided me with the work. Also, I am glad to have received guidance from other mentors of the organization, especially Adam Tenderholt and Karol Langner.

I found the Open Chemistry community very supportive and look forward to other people joining on this project and grow it further.

About Me

Name : Nitish Garg
Nick : nitish6174
GitHub : github.com/nitish6174
Website : nitish6174.com
LinkedIn : linkedin.com/in/nitish6174

--

--

Nitish Garg

B.Tech - CSE @ IIT Guwahati | Coding | Astronomy | Piano | Table tennis | http://nitish6174.com