A Bit of History
When I joined the Eclipse CDT project back in 2002 (yeah, it’s been a long time), I was working on modeling tools for “real time”, or more accurately, embedded reactive systems. Communicating state machines. I wrote code generators that generated C and C++ from ROOM models and then eventually UML-RT. ROOM was way better by the way and easier to generate for because it was more semantically complete and well defined. That objective is key later in this story.
We had the vision to integrate our modeling tools more closely with Integrated Development Environments. We started looking at Visual Studio but Eclipse was the young up and comer. That and IBM bought us, Rational by that point, and had already bought OTI who built Eclipse so it was a natural fit. And we were all in Ottawa. And by chance, Ottawa-based QNX had already written a C/C++ IDE based on Eclipse and were open sourcing it and it was perfect for our customers as well. It’s amazing how that all happened and led to my life as CDT Doug.
Our first order of business was to help the CDT become an industry class C/C++ IDE and become a foundation for integrating our modeling tools. Since we wanted to be able to generate model elements from code, it required we have accurate C and C++ parsers and indexers. No one figured we could do it, but we were able to put together a somewhat decent system written in Java in the org.eclipse.cdt.core plug-in.
Scaling is Hard
However, as the community started to try it out on real projects, especially ones of a significant size, we started to run into pretty massive performance problems with the indexer. We were essentially doing full builds of the user’s projects and storing the results in a string table. On large projects, builds take a long time. But users expect that and put up with it because they really need those binaries it produces. They don’t have the same patience for their IDEs building indexes the don’t really see and we paid a pretty high price for that.
As a solution, I wondered if we could store the symbol information that we were gathering in a way that we could load it up from disk as we were parsing other files and plug the symbol info into the AST the same way we do symbols normally. This would allow us to parse header files once and reuse the results, similar to how precompiled headers work. The price you pay is in accuracy since some systems parse header files multiple times with different macro settings. But my guess was that it wouldn’t be that bad.
It was hard to convince my team at IBM Rational to take this road. Accuracy was king for our modeling tools. But when I moved to join QNX, I decide to forgo that requirement and give this “fast indexer” strategy a go. And the rest is history. Performance on large projects was an order of magnitude faster. Incremental indexing of files as they were saved isn’t even noticeable. It was a huge success and my proudest contribution to the CDT. And I was even better when other community members handed us their expertise to make the accuracy better and better so you barely notice that at all either.
C++ Rises from the “Dead”
Move the clock a decade later and we started running into a problem. The C++ standards community has new life and are adding a tonne of new features at a three year cadence. The CDT community has long lost most of the experts that build the original parsers. Lucky for us a new crop of contributors has come along and are doing heroes work to keep up. But it’s getting harder and harder. One thing we benefit from is how slow embedded developers, the majority of users of CDT, are to adopt the new standards. It gives us time, but not forever. We need to find a better way.
Then along came the Language Server Protocol and a small handful of language servers that do C/C++. I’ve investigated four of them. Three of them are based on llvm and clang. One of them is in tree with llvm and clang in clang-tools-extra, i.e., clangd. The other two are projects that use libclang with parts of the tree, i.e., cquery and ccls. Those two projects are what I call “one person projects” and with cquery at least, that person found something else to do last November. Beware of the one person project.
I’ve spent a lot of time with clangd when experimenting with Visual Studio Code. For what it does, clangd is very accurate and really fast. It uses compile_commands.json files to find out what source files are built and what compiler and command lines they use. I’ve had to fork the tree to add in support for discovering compilers it doesn’t know about, but that was pretty easy to put together. It gives great content assist and you get the benefit of clang’s awesome compilation error diagnostics as you type. It shows a lot of promise.
However clangd for the longest time lacked an indexer. When you search for references it only finds them in files you have opened previously. The thought as I understand it is that you use another process to build the index and that is usually done at build time. However, not all users have such an environment set up so having an index created by the IDE is a mandatory feature. Now, clangd did eventually get an indexer but it does what the old CDT indexer did and completely parses the source three. That predictably takes forever on large projects and I don’t think users have the appetite to take a huge step backwards like that.
While waiting for the right solution to arrive for clangd, I thought I’d give the Microsoft C/C++ Tools for VS Code a try. My initial experience was quite surprising. It actually worked well with a gnu tools cross compiler project I used for testing. You have to teach it how to parse your code using a magic JSON file, which fits right in with the rest of VS Code. It’s able to pick out the default include path when you point it at your compiler. It has a MI support for debugging, though no built-in support for remote debugging but that was hackable. It seemed like a reasonable alternative, at least for VS Code.
However when I tried it with one of our production projects it quickly fell apart. It does a great job trying to figure out include paths, similar to the heuristics we use in CDT. That includes things like treating all the folders in your workspace as a potential include path entry. But it tended to make mistakes. It even has support for compile_commands.json files so I could tell it the command lines that were use. It did better but still tried to do too much and gave incorrect results.
That and it doesn’t have an index yet either. One is coming soon, but if it can’t figure out how to parse my files correctly, it’s not going to be a great experience. Still a lot of work to do there.
Where do we go from here?
As it stands today, at least from a CDT perspective, there really isn’t a language server solution that comes near what we have in CDT. Yes, some things are better. Both these language servers are using real parsers to parse the code. (or at least clangd is. Microsoft’s, of course, is closed source so I can only assume). They give really good content assist and error diagnostics and open declaration works. But without a usable indexer, you don’t get accurate symbol references. And I haven’t even mentioned refactoring which CDT has and which is not even suggested in the language server protocol.
So if all your doing is typing in code, the new language servers are great. But if you need to do some code mining to understand the code before you change it, you’re out of luck. The good news is that we are continuing to see investment in them so who knows. But then, maybe the CDT parsers catch up with the language standards before these other language servers grow great indexers making the whole thing moot. I wouldn’t bet against that right now.