The Foundations at the Core of C++ are Wrong — Part 4

Ianjoyner
9 min readJul 3, 2024

--

This is the fourth of a series of articles examining Bjarne Stroustrup’s writings about the core foundations in C++. The first part is here with links to the other parts:

Part 3

Purposes of a Programming Language

These comments appear in the ‘Philosophical Note’ section in both first and third editions, as well as in ‘The Design and Evolution of C++’ (1994).

A programming language serves two related purposes: it provides a vehicle for the programmer to specify actions to be executed and a set of concepts for the programmer to use when thinking about what can be done.” (1986 page 6) (2013 page 13)

It is the second aspect that is far more important. However:

The first aspect ideally requires a language that is “close to the machine”, so that all important aspects of a machine are handled simply and efficiently in a way that is reasonably obvious to the programmer. The C language was primarily designed with this in mind.” (1986 page 6) (2013 page 13, ‘aspect’ changed to ‘purpose’)

I suspect that Stroustrup thought this first aspect was more important. However, the machine made clear to the programmer is an abstract hypothetical machine that is the basis of a language. It is the executional engine of a language. It is the purpose of the language compilers, then operating systems and runtimes to firstly convert the languages hypothetical machine into efficient machine code of any target platform, then the OS and runtime to map logical requests to the actual physical resources available on any machine and then taking into account the requirements of other processes. Programmers cannot and should not do that for themselves or they can lock out or do worse for other processes.

Perhaps people thinking of efficiency in execution might believe that a language must be close to the actual machine, but that is also wrong. Programmers making assumptions about underneath execution mechanisms can get in the way of automatic optimisation, even with the correct assumptions. However, the assumptions are often not correct. They can be even more incorrect when a program is moved to another environment where the same assumptions don’t apply.

Programming languages provide their own view of an abstract machine. C actually has a particular abstract machine itself. This might match some processor styles, but not all. The problem with C is it was built around certain low-level machines and many assumptions are now wrong. The original machines C was based on are now themselves abstract machines, albeit uncomfortably low-level and primitive machines. The following ACM article makes that clear and also that it is an impediment to progress in processor design and concurrency.

https://queue.acm.org/detail.cfm?id=3212479

Aspect 1 is contrary to aspect 2.

The second aspect ideally requires a language that is “close to the problem to be solved” so that the concepts of a solution can be expressed directly and concisely.” (1986 page 6) (2013 page 13)

Stroustrup has the right conflicting aspects in mind: a language is close to the machine or close to the problem. System programming covers aspect 1. Aspect 2 is general application programming. The two are opposites and better separated with different languages. This helps the programmer focus on the correct level of thinking. Actual physical computers are much too far from the problem domain or the way programmers should think in aspect 2, the execution steps far too small, and in fact, mostly useless for thinking about a problem.

This is the oft-cited ‘semantic gap’. Think how an expression is evaluated and the result stored in a variable (by assignment). The machine must load all inputs into registers, combine them according to operators. If there aren’t enough registers, some must be combined in subparts. Eventually one register must hold the result which is stored back to a memory location. All that just to say:

target := <source_expression>

Knowing how this assignment works on the machine is little help in writing a program. A language that includes the extremes of the semantic gap is bound to end up with a confusion of what belongs where, what to use or not use. The advice given is to avoid aspect 1 facilities and use aspect 2 features. Trying to bridge the semantic gap in a single entity results in poor separation of concerns. This is the problem with C++. Languages should be oriented to aspect 1 for system programming, or aspect 2 for general programming. Mixing the two results in confusion arising from lack of separation of concerns.

Now we certainly want our problem-oriented languages to have efficient translations to the machine level. But that does not mean we have as much of the machine level in the language as possible. In fact, it is the opposite. Good programming languages do a good job of abstracting and hiding the machine level as possible. Where machine-level abstractions are exposed, we have weak abstractions. Such abstractions really lose the advantage of the language abstraction.

Furthermore compiler code generators and optimisers are more likely to do a much better job at producing optimal code than the average programmer. Even above average programmers do not have time to profile all target platforms in the same way a code generator specialist has time to do. When programmers play at that level they can often compromise optimisation. The C and C++ focus on that level can indeed be counter to good optimisation.

C and C++ are too exposed to translational and operational semantics (how the machine works underneath, that is aspect 1) and do not provide enough support for denotational and axiomatic semantics, which is what the meaning of a program is and the logic of the program (that is aspect 2). How the problem entities relate to each other is the important aspect, not how they relate to the machine. The mapping to the machine is not handled by the programmer, but the compiler, the runtime and the operating system. Programmers attempting to do that will obstruct the optimal working of the system which must balance all process requests for resources.

Now programmers, especially new ones, naturally feel they must know how the actual computer works and thus like the operational approach, seeing how a language maps to a machine, and C, being closer to the machine eases this thinking. I used to think that and I wanted to know how computers worked. However, that is fundamentally the wrong way to think about programming. We feel the closer we get to the machine, the more we understand. But that is the wrong notion. We must understand programming independent of any machine.

C and C++ have made far too many concessions towards the machine and aspect 1. But as Perlis says, this is not even right for system programming. The collective thinking of the programming world from how computers work to focus instead on computation must change if we are to move toward the future of programming.

Programmers as a whole are beginning to understand this by using scripting languages such as Python and Ruby that are interpreted and thus a long way from the machine. These languages work. They might not be efficient compared to compiled languages, but they suffice for many purposes. Compiled languages can be as abstract as interpreted languages. Programmers must learn to feel comfortable with leaving many details to the system.

We can have compiled languages that produce very efficient code, and yet these also don’t need to be close to the machine. Aspect 1 in Stroustrup’s thinking in 1986 might have applied to a greater extent in 1986, but it was wrong, particularly in the long term.

Tony Hoare wrote about the axiomatic approach to programming in 1969, well before C++, or even C. The lesson has not been heard.

https://www.cs.toronto.edu/~chechik/courses05/csc410/readings/hoare_axiomatic.pdf

https://www.cs.cornell.edu/courses/cs7194/2019sp/slides/hoare.pdf

http://cs.iit.edu/~smuller/cs536-f23/lectures/07-hoaretrp.pdf

While we can program in axiomatic or denotational style in machine-oriented languages it is much more work and burden on the programmer. That is the benefit of true high-level languages — they provide support for that level of abstract thinking, low-level languages are compromised. C benefited from structured syntax, but it was there more for the fact of not having to write assembler rather than support of structured programming. The true thinking of structured programming went unappreciated and not understood.

Languages that were more faithful to structured programming and other sophisticated techniques became disparaged by C people as ‘training wheels for beginners’, or ‘crutches for weak programmers’. This appealed to programmer ego for those who could now look down on the more advanced languages and their programmers. In C compromises for unstructured programming were included, like goto, break, continue, return that further undermined real structured programming. At least programmers do tend to avoid goto.

Early C++ used C’s pointers for references. Now C++ has ‘smart pointers’ and references (confusingly declared with &s), with warnings that programmers should now not use pointers. While pointers are the goto of the data structure world it is a harder lesson to learn. Indeed gotos and pointers really are quite easy, just referencing a location in the code or in memory (pointers can do both). But it is harder to appreciate why these are bad structuring mechanisms.

When languages include these mechanisms it seems that they must be there for some purpose. In C and C++ it is because the languages are not sufficiently structured or OO to avoid these completely. Languages that truly support the paradigms do not need or include such ‘just in case’ compromises.

It is further burden on programmers to then learn these concepts (even one as simple as goto) and then know why they should avoid them and how to avoid them. Laziness can often result in taking the shortcut approach which frequently has very undesirable long-term effects on programs (becoming inflexible and unmaintainable, often requiring massive refactoring). Mostly these things are required to get out of some corner where the programmer has not been self-disciplined in programming. Gotos beget more gotos, pointers beget more pointers. Eventually software becomes so messy major surgery (refactoring) is the only option. Cleaning up poorly structured software is difficult and error prone.

C++ has done little to move programming forward by removing old constructs that overlap with new ones. The trade off has been for backward compatibility, not forward improvement. Sure cleaning up would cause some short-term pain, but the gains are long term.

This is why C++ is such a deficient language. It has kept the practice of programming hamstrung by the past. However, there seem to be many people who accept Stroustrup’s faulty programming philosophies with little question and we see a plethora of online tutorials on the various aspects of C++. Language quality is not what a language can do (they are all fundamentally the same), or even how fast the programs generated run, it is how the language supports programming and the thought processes of the programmer. Given that, factors like performance follow.

Warning on language extension from RJ Pooley ‘An Introduction to Programming in SIMULA’ (1987) describing how the requirements of word processor or mathematical developers will be very different: “There are two possible solutions to this problem. One is to keep extending the language to try to provide all the possible features which will be required. This leads to impossibly complicated languages and cumbersome compilers and runtime systems. I will not name any of the languages which have fallen into this trap. If you ever meet any, you will recognize the description.

C and C++ try to bridge the even wider gap between system and application programming.

It is a sad fact that the complexities programming languages include do not in any way affect the programs we can write, but it does affect the way we write them.

Tony Hoare: “I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

Stroustrup: “The connection between the language in which we think/program and the problems and solutions we can imagine is very close.” (1986 page 7) (2013 page 13)

This is aspect 2. Aspect 1 goes against that. Trying to take a language strong on aspect 1 and add features for aspect 2 into it, can only result in a very complex, compromised, and confusing language. The two aspects express the ‘semantic gap’. Aspect 2 says the semantic gap between thinking and expression of the problem should be small — we can easily determine correctness and other qualities.

Aspect 1, coding in terms of the machine results in a wide semantic gap, which makes thinking and assurance of other qualities difficult. Of course there are those who pride themselves they can master a wide semantic gap, but that is false pride and really the ultimate stupidity in programming. There is no medal for mastering unnecessary complexity. Thinking in terms of how the machine operates or operational semantics goes against thinking and understanding. I can’t see how the next part of Stroustrup follows.

Part 5

--

--