Google Summer of Code with “Open Chemistry”
Computational Chemistry Web repository : My GSoC 2017 project
I was fortunate to work with Open Chemistry during Google Summer of Code 2017. It was a great learning experience for me and has motivated me to contribute to open-source further ahead.
Here, I summarize my project, work that was completed during the GSoC period and tasks which are to be taken up in the future.
Project deliverable
Abstract on GSoC 2017 Project list page
This was a completely new project aiming to develop a framework which can be used by chemistry groups to setup their own data repository of computational chemistry format log files and deploy a (public) server providing REST API and web interface to browse and view the documents in the repository, search/filter using parsed attributes available, add new files to the database, download a document’s data and instantly parse a log file with cclib
just by uploading the file in browser.
The project makes use of cclib
, 3Dmol.js
, openbabel
and is a flask-mongodb based framework.
Links
- The project is available at:
https://github.com/nitish6174/openchemvault - Project wiki is given at:
https://github.com/nitish6174/openchemvault/wiki - Here is the TO-DO for the project:
https://github.com/nitish6174/openchemvault/issues/1
Viewing my contribution
I started this project from scratch and my contribution is covered by the commits between Mar 11, 2017 to Aug 19, 2017 on the commits page of project repository.
What work was done
These are the main functionalities completed during the GSoC period:
- Parse files in a directory with
cclib
, determine few more useful attributes and store this parsed data in MongoDB database and generate SVGs - REST APIs to:
* Fetch molecular formulas
* Fetch documents corresponding to a molecular formula
* Get details of a particular document
* Parse a log file and return results
* Add a log file to data repository - Similarly, web front-end consisting of:
* Page to instantly view parsed results by uploading a file
* Page to add log file to data repository
* Browsing of repository by molecular formula
* Search page supporting filtering with various attributes of logfiles
* Displaying details of a document selected from browsing menu or search results
* Option to download document/file data
* 3D rendering of molecule usingXYZ data
- Dockerization to allow setting up an instance easily
Future work
At the time of writing (Aug 2017), the following work is planned to develop the project further:
- Determining IUPAC and common name using InChI/other info
- Finding and storing data/properties about molecule using
PubChemPy
- Providing support for searching by molecule’s InChI/InChIKey
- Providing support for searching by molecule’s name (IUPAC and common)
- Adding authentication for APIs and users
- Testing scalability of database
- Writing tests for APIs and database setup script
- Handling invalid files and reporting unsupported format
- Developing a regex-like molecular formula matcher
Acknowledgements
I am very thankful to my mentor, Prof. Geoffrey Hutchison who guided me with the work. Also, I am glad to have received guidance from other mentors of the organization, especially Adam Tenderholt and Karol Langner.
I found the Open Chemistry community very supportive and look forward to other people joining on this project and grow it further.
About Me
Name : Nitish Garg
Nick : nitish6174
GitHub : github.com/nitish6174
Website : nitish6174.com
LinkedIn : linkedin.com/in/nitish6174