New testing for a new release

Greg Landrum
6 min readMar 26, 2020

The next major RDKit version — v2020.03 — should be released next week. Once the release is out I’ll do a couple posts on the RDKit blog about some of the new features, but I thought it would be worth doing a quick post beforehand here to describe some new testing that I’ve started doing.

Be forewarned that this post is also a bit of light advertising for the RDKit support contracts I offer through T5 Informatics.

The RDKit and testing

The RDKit is pretty well tested: we have automated tests for:

  • the C++ code (really good coverage)
  • the Python wrappers (really good coverage)
  • the pure Python code (really good coverage)
  • the Java wrappers (basic coverage)
  • the C# wrappers (minimal coverage)
  • the PostgreSQL cartridge (good coverage)
  • the documentation (good coverage)

As of the 2020.03 release, all of these tests, except those for the C# wrappers, are run automatically with every commit in github. We don’t merge pull requests that break the tests, so we can always have some confidence that the code is in good shape. As a software developer, that’s a great feeling!

The new tests

With this release cycle I’ve started doing an additional manual testing step with the goal of being able to detect differences in results between RDKit versions. These differences could be consequences of bugs that were fixed during the release cycle or be due to newly introduced bugs. I would certainly hope that it’s all the former, of course. In any case it’s good to see and understand them.

For the tests themselves, for this release cycle I read all 1.8 million molecules from the ChEMBL 25 SDF (ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_25/chembl_25.sdf.gz) and then generate the following values:

  • Canonical SMILES
  • InChI
  • Morgan fingerprint
  • RDKit fingerprint
  • Pattern fingerprint
  • MolLogP

I picked these values because they exercise a bunch of core molecule functionality, including bits like aromaticity perception, chirality handling, substructure matching, etc. This is still work in progress and I will certainly be adding additional values for the next release cycle. I’m trying to avoid things that are expensive to calculate (like doing conformation generation or 3D descriptors), but I may add a additional large-scale tests for those types of things.

I stored all the values generated with the 2019.09 release in a text file, re-generated the values with the beta of the 2020.03 release and compared the two sets of results to each other.

Results for v2020.03

So how’d we do?

For the 1.8 million molecules in the ChEMBL 25 data set, 1443 had differences between v2019.09 and v2020.03. These 1443 molecules all produced different canonical SMILES between the two versions (along with differences for all the other values). After overcoming my initial “oh shit!” reaction, I dug in to understand the differences. I’ll say some more about this below, but it’s worth first pointing out the positive part: the remaining 1.8 million compounds produced exactly the same results.

I went through the 1443 examples with mismatches in order to understand what’s going on and was very happy to be able to trace it back to a single change, the fix for issue #2895. Fixing this bug changed the way aromaticity is perceived in some fused-ring systems. This leads to a different internal representation, different canonical SMILES, different fingerprints, etc.

Some details about the differences

If you don’t want to get into the weeds of what the actual differences are, just skip over this section.

Before I start: the goal of this exercise is not to get into a discussion/argument about aromaticity. The RDKit’s default aromaticity model is described in the documentation. The reason I made the changes associated with issue #2895 are not because I wanted to fix/change the aromaticity model, but because what the code was doing wasn’t actually consistent with what the documentation described. That’s now cleared up.

Here’s a relatively simple molecule, CHEMBL38451, where the difference appears:

In the 2019.09 version (and earlier versions) of the RDKit, the full pi system (all 5 rings) is considered aromatic because it contains 22 pi electrons. This is 4n+2, so all the atoms/bonds are marked as aromatic. The problem here is that this isn’t actually a single ring: atom 20 (with the highlight) isn’t part of the “envelope” going around the outside of the molecule that forms the actual aromatic ring. We can see the differences clearly by drawing the molecule with the aromaticity perceived by the v2019.09.3 and v2020.3.1b1:

Dashed lines indicate aromatic bonds here, and we can see that the ring with the NH moiety (atom 21 in the first image) is no longer considered aromatic.

This form of change, where the new version has one or more non-aromatic rings that were aromatic previously, accounts for 1424 of the 1443 molecules with different results. In the 19 remaining molecules the new version actually has an *extra* aromatic ring, for the same reasons discussed above. Here’s an example of that, CHEMBL2314369:

In this case all of the rings are considered to be aromatic. This molecule actually has four different aromatic ring systems in v2020.03, they are indicated via highlight color here:

The envelope ring system (dark blue in the image), which contains ten electrons and is thus considered aromatic, was not correctly identified by v2019.03. That’s now been fixed and the RDKit’s aromaticity behavior is consistent with the documentation.

As a bit of teaser: all of the images in this blog post were also created with the new version of the RDKit. You may have noticed some differences relative to old versions. I’ll describe those in an upcoming RDKit blog post.

Wrapping up

It would be nice to have some kind of usable report for this, but I haven’t figured out the details on that yet. It needs to be something that’s human readable but isn’t a giant amount of work for me to produce (doing a release is already a fair amount of work). If you have any ideas please let me know! In the meantime, I’m happy to share the new testing code itself (at the moment it’s a couple of jupyter notebooks) as well as the detailed results (including the full list of molecules where things changed) and my notes about them with people who have RDKit support contracts through T5 Informatics.

I plan to keep doing tests like this, possibly with expanded of RDKit features, and analyses of differences for future RDKit releases as well. I probably will not be doing a public post like this every time, but I’ll definitely be sharing them with people with support contracts. If this kind of cross-version stability is important to you and you want to support these efforts, please consider signing up for support!

Appendix

An aside about the automated tests and the wrappers, particularly the Java and C# wrappers. The idea of these tests is primarily to ensure that the wrappers themselves work: that values are properly passed back and forth between the languages and that things like function arguments work as expected. Tests of the underlying functionality, which is in C++, are handled by the C++ tests. So though it would be really nice if the coverage were better, the less-than-spectacular coverage here isn’t as big of a concern as it would be elsewhere.

--

--