The RDKit and Modern C++

Greg Landrum
3 min readSep 24, 2016

--

Note: This document has been cooking for a while and I’ve circulated it to smaller groups in a couple of different forms; it’s time to get it out there and start moving.

The topic is reasonably technical, but there is an important practical implication: the change discussed here will break compatibility with older C++ compilers. This should not be a problem for people using binary distributions of the RDKit (i.e. conda packages or packages installed by the operating system’s package manager), but it may introduce problems for people who build the RDKit themselves on older systems.

Background

It’s time to start “allowing” the use of modern C++ (by which I mean C++14) in the RDKit. I think this is an important step both for code quality in the toolkit itself and for allowing us (the developers) to continue to learn and use modern tools. Who knows, it may even help with performance. :-)

This move would, of course, break compatibility with some older compilers.

Here’s one view of compiler support for C++11 and C++14: http://en.cppreference.com/w/cpp/compiler_support It looks like g++ 4.8+ and VC++ 2013 are both fine with most/all C++11 features. VC++ 2015 does even better and g++ 5 is very solid. Clang, which is the default on the Mac and available for both Windows and Linux, has great support.

The thing I am most concerned about is that RHEL6 only includes g++ 4.4. RHEL7 is fine, but I would guess that most large organizations aren’t that up-to-date yet. I wouldn’t be overly surprised if some haven’t even finished the move to RHEL6. It looks like an “enterprise” alternative here is to use the RedHat Developer Toolset, but I don’t have personal experience with that yet. Clang is always an option, but it is something people will have to install on their own.

The move to a new release model for the RDKit, with the possibility of explicit long-term support releases, helps make this a good time to start the switch over to modern C++.

Concrete Proposal

Starting after the 2016.09 release we create a “modern C++” branch. This will start off containing changes that can be applied automatically using the clang tooling. Once that is stable and “happy”, it gets moved over to master and a “legacy C++” branch is created. New feature and bug fix development continues on master and changes are ported, whenever possible, over to the legacy branch for a while (see “questions” below for some discussion of this).

Due to library incompatibilities, I suspect that binaries are going to need to be built against the legacy branch. This adds another row or three to the build matrix. Not the end of the world if everything is properly automated, but definitely something to be aware of.

What’s needed?

  • Updated (well, finished) coding standards. Probably still “allows” old style for the next release cycle, but then modern style becomes “mandatory” (=strongly encouraged).
  • The relevant build scripts and/or docker images
  • A list of which language features will be a priority and which (if any) shouldn’t be used (for compatibility reasons or whatever)

Questions

  • Does Travis-CI have up-to-date compilers available (I suspect yes, but I haven’t verified that)
  • How long does the “legacy” branch stay live? A year? Two? This will probably end up being dictated by support customers
  • It’s inevitable that there are going to end up being new features that are hard to backport to the legacy branch. The first time this happens it’s going to raise a bunch of interesting questions, but I think it’s a bridge we can cross when we get there. Some careful planning of which modern features are allowed should help a bit here.

--

--