C#88: The Original C#

A few notes for posterity

Introduction

Every once in a while the topic of the original C# (vintage 1988) comes up. This is the project for which I was recruited to Microsoft and it was a very interesting beast, with even more interesting colleagues. I thought I would write a few notes about this system while I still remembered the basics of how it worked. Obviously a much longer article would be necessary to get everything down but you should be able to get sense of its operation from this primer. At its zenith, C#88 was able to build and run “Omega” — what you would now call Microsoft Access.

System Basics

The system was designed to support incremental compilation, and that factored heavily into every design aspect. Persistent storage between compiles was provided by what we called the Global Symbol Cache (GSC). The GSC was an object oriented database that provided transactions and simple isolation. Supporting one writer and many readers, the readers all saw the database as it existed when they began their session. The writer observed its own writes as usual. Transactions and Isolation were all built using a page-mapping strategy not unlike the one used in PDB files even now… this is not a coincidence. The GSC was a cache in the sense that you could delete it and lose nothing but time.

The C# language was designed to be substantially compatible with the C programming language. As it happened the underlying compiler supported what you might call a variant of K&R C. That means function prototypes are optional, and lots of things “int” by default. The compiler produced hybrid PCode and native x86 assembly, traditional for MS applications at the time.

The compilation model was what you might get if you built something like “C.Net” today. Files could be, and were, compiled in any order, including .h files which were really not considered to be any different than .c files. The system did not do something like “scan for definitions then parse bodies” as you might expect, rather, it exploited its incremental nature to incrementally reparse out-of-order until convergence as we will see below. The team mantra was “twice incremental is still cheap.”

Source files (SRFs) were divided into source units (SRUs). A typical SRU might be a single function, or a single declaration, but they could be compound where multiple definitions were atomic (e.g int x,y;). Source units created language objects (LOBs) and depended on them. This is essentially how the compilation convergence happened. If, due to errors, an SRU could not be compiled, the necessary-but-missing LOB was recorded. Should that LOB be encountered at a later point then any SRU that required it would be queued for a recompile. This process might take several iterations…

The system processed the files in the order specified by a simple project.ini file, which meant that “make” was largely unnecessary. Among the more interesting features was an option to emit a new .ini with “the order you should have told me to compile this junk in” — thereby making it fairly easy to create and maintain a decent ordering without much human effort.

Of course it was a very good idea to put the .h files first because they tended to have a lot of #define statements. If you didn’t do so, your first build would really suck…

The system was built on top of an existing C compiler (the Apps PCode compiler, as opposed to “CMerge” the native compiler then in use) and basically the system would set up that compiler’s symbol table and so forth to be “just right” for starting at the recompilation context. This was not easy but seemed to work out ok.

Additionally, .obj files and other binary formats could be “imported” so that the system would know about their definitions and be able to do crazy things like figure out which symbols would be resolved case-insensitively at link time. That was a hairy mess.

Incremental recompile was achieved by noting changed source files, and recomputing the CRC of each SRU in the file. Any changed SRUs would be recompiled and of course this would cause dependencies to be recompiled until the system converged again. Importantly, non-observable changes (e.g. adding a new local variable to a function) did not cause dependencies to become dirty.

Because of the above dependency management, some common practices required special treatment. Of special interest is this one:

#ifndef FOO

#define FOO

#endif

This required special treatment because there is a dependency cycle between the SRU that contains the #ifndef and the SRU that contains the #define. Without special treatment it would never converge..

Finally, the system incrementally produced ready-to-link object files with the entire content of a 64k segment in one file. Those were then linked by the usual MS Incremental linker.

Consequences

The presence of high quality dependency information made it possible to implement very cool source browsing features. These ultimately found their way into the product line and maybe could be considered the antecedent of IntelliSense. The fact that virtually all knowledge was in the GSC also meant that many system errors could be diagnosed by browsing the GSC object graph in a more raw form.

The fact that the typical input code was written for a toolset where types were only loosely enforced across compilation units meant that the C# system was good at finding inconsistencies still in the code. Very good. Thousands of errors had to be fixed to get the existing code to build. The fact that all global symbols were in the same namespace tended to work out well in that it forced out certain horrible practices… You could not give the same variable two different types in two different compilation units and get an accidental union for instance.

Similarly, the fact that .lib files and the like had to be imported meant that those names were also unified and crazy overrides with unknown side effects were banished.

Basically the thing was a super linter. Which was super painful…and super unpopular…

The program databases were considered huge… weighing in at a whopping 8–12M for the big programs… toys by today’s standards. They might have been smaller if we hadn’t used a pointer-ish data structure as our on disk representation —I can’t say that I’m in love with object oriented databases at all, even in 2018, much less 1988.

Outcome

It was cancelled… too big for its time, too different, not considered valuable enough. But it did generate a ton of fruit. Many features in the Visual Studio ecosystem have their roots in C#88 — debugging formats, incremental linking, incremental compilation and minimal rebuild, maybe IntelliSense even… Hard to say it was a failure with that size of impact crater, but it never did go anywhere directly.

Good times.