At Digital Frontiers we all have our favourite languages. The other day we were arguing about the advantages and shortcomings of some specific ones and decided to have a closer look at them, using a sudoku verifier as a common example. Today, let’s talk about: (you guessed it) C!
The dreadful C language you want nowhere near you
C language? “Oh no, thank you!”. Either you learned C ages ago and want to forget it or you have never learned the language but heard it’s awfully complicated and ugly. Chances are, in fact, it’s almost certain, you will never use C in your career.
So why continue to read? It’s already a weird coincidence that you stumbled upon the post, so what can keep you from moving on to something more up to date, fancier, even more useful?
Do some googling maybe… what language was the first version of Java written in? And C#? And Python? And many others? It’s either C or C++, which basically happens to be C plus objects. C is used in a variety of projects, ranging from compilers, interpreters, databases, systems- and embedded programming to libraries where computational efficiency and stability are of utmost priority.
What is special about C then? What differentiates the language from the currently popular ones?
It has a thin layer of abstraction. The abstraction that usually saves you from a lot of tedious, repeating work, that makes programming with some modern languages so pleasant and reduces the number of lines of code substantially. But sometimes it is not what you want, sometimes you want to program as close as it gets to the machine code without resorting to the assembler.
The other thing is “raw” memory access with pointers. Unlike in most modern languages allocation and deallocation of memory are done by the programmer, who can program very efficiently but also wreak havoc when being inattentive.
There are C compilers for almost every processor architecture. That means you’ve got almost universal portability with your C code — however, sometimes small details have to be adjusted because of the architecture specifics.
Memory overhead is very low. C is used where other programming languages would not, because of scarce resources.
Problem: sudoku verifier
This blog post is a part of a series in which we implement a Sudoku verifier or solver in order to show the advantages and disadvantages of a programming language. A Sudoku verifier is indeed a trivial problem, solved in a myriad possible ways. If you provide a hardcoded Sudoku array and merely display the result (“valid”/”not valid”), the C implementation of the Sudoku verifier doesn’t differ much from an implementation in a mainstream modern object-oriented language… Only that you don’t have objects. You don’t have lambdas. You don’t even have lists or sets! What you’re left with are “lousy” arrays, for and while loops everybody avoids now and… pointers.
It took me some time to decide what Sudoku implementation should I choose. I’ve chosen a very basic one, naive even. I wanted the bare solution to be easily understandable and so suboptimal that it would make sense to parallelize the processing.
Ironically, input caused me the most headache when implementing the solution. No, not because reading from a file is any challenge for the C language — using fopen and strangely sounding fscanf is surprisingly simple and efficient (by the way, the whole source code of the sudoku verifier can be found on GitHub):
But because I had this fancy idea of making it possible to load an array of previously unknown size. Seems like an easy task. And in most languages, it is — because they have lists. And you know already that C does not. When you want to return an array you have two possibilities:
… either return a pointer to it:
where the number of stars represents the dimensions of an array. In fact, it is only an interpretation — strictly speaking, it’s a pointer to a pointer. It points with its imaginary magic wand to a place in memory with some data. The kind of the pointer (pointer to an integer, string, some structure) lets you interpret the contents of the memory. The cool thing you can do with pointers is:
array + 1
called pointer arithmetics. But what does it do? It moves the magic… forget it, the result of the above operation offsets the pointer by the amount of memory needed to accommodate the type it points to. So if it is a pointer to an integer and you increment it will point to memory just past the integer it pointed to before. But… is it really an integer that is stored in the memory? Maybe nothing meaningful at all? Maybe some other relevant data? You have to know it very well because the pointer will not tell you whether it points to an array or a vector and if so, where its data ends. Manipulating uninitialized memory or data belonging to a different obj… I mean structure and variable will introduce undesired randomness at best but mostly will send your program instance to your local equivalent of hell.
… or you can initialize an array parameter of a procedure like this:
But in this case, you have to know the size of the dimensions beforehand. Returning a locally created array from a procedure causes a warning at best and almost certainly problems with memory. So this is not an option either.
So what am I going to do? What am I going to do? Panic and cold sweat appear.
Then I introduced a queue:
How does it work?
Do you see the beautiful stars and not less beautiful ampersands and arrows? Pointers, pointers everywhere.
Each new number found in the file is being added to the queue until nothing else can be read from it. As we can see, each node of the structure has to be dynamically allocated memory with malloc:
currNode = malloc(sizeof (struct node);
and the most “pleasant” thing — everything that was allocated has to be freed, too:
While freeing the memory reserved for the peculiar queue is not rocket science, it’s neither obvious nor fast to do for a programmer of a modern language with a “thick” layer of abstraction and an obvious invisible instrument called garbage collector.
Do you see the pattern?
- “direct” access to memory
- reinventing seemingly basic things
- the necessity to “clean up” after yourself unless you want to face more and more memory leaks as your program gets bloated over time.
All of the above, but especially the last point forces you to structure your program and control it as rigidly as possible. It makes you rethink your workflows, too. It feels like negligence punishes you much more in the case of C than with other languages. And C many a time won’t give you a helping hand during compilation. It slaps you in the face during the execution, though.
Finally to the point. We’ve got our data, we’ve verified that the number of elements equals either 9x9 or 4x4 and now we want to verify that the sudoku is valid. And because it is such a huge task and we don’t want one CPU core to die of fatigue while the others would die of boredom, even more, we want to show that C supports something so advanced as parallel processing, we want to employ threads to solve our enigma.
First, something not for the faint-hearted:
Can you see it? variables declared in… well… in no class or procedure, just floating around, what are they? They are global variables and maybe you’ve heard that everything that is global and static is evil incarnate, only that in C, especially in very simple C projects like this one, they are a useful tool compensating for the absence of objects and making life easier in some other aspects one of which I’ll mention in a moment.
In this, as previously noted, naive sudoku verifier, I check the usual conditions (in this case, for 9x9 cells sudoku):
- The numbers are from 1 to 9,
- they won’t repeat in any row or column and
- if you divide the matrix into 9 smaller matrices with each having only 3 x 3 cells, the numbers in those cells cannot repeat either.
That’s how the main loop of the program looks like:
First, your eyes gleam with joy upon seeing all those for loops which you missed so badly after the excruciating pain involved using streams, lambdas and LINQs. Then, again, a fair amount of ampersands and stars. A fair share of debugging time in C or C++ is, in fact, finding cases where you wanted a primitive like integer but you used a pointer or the other way round. you miss a star or use a star too many and go places you’d never dream of going to.
To check the 3x3 submatrices you need to know both the dimensions (for example, to check the submatrix starting at 6th row and 3rd column), to check columns or rows you only need one parameter. Just don’t ask about the way the arg parameter has been initialized to accommodate the actual number of row or column — just believe me it works.
A few things might be interesting in the verification procedures (below the one for rows):
As you see, due to the lack of those fancy tools, we resort to a kind of vector of bits to check whether any number can be found more than once in the row. But how do we indicate that the row is valid? Well, we set the validity “bit” on the
valid global vector for this row. Despite using a global variable that every thread has access to, we change, in fact, only the memory “cell” that was “assigned” to this thread and part of the problem. Otherwise, we would have to use a mutex that would probably fully combat the whole purpose of parallelizing here or use one of the fields of the structure of the parameter.
It remains to join the threads and verify the validity bits of each of them. If all of them are valid — the whole sudoku is valid. This is, however, just the implementation detail. And of course, of course, to free the allocated memory.
It’s a relief! We’ve come to the end of our journey. We can verify sudokus and we can verify them, hopefully, fast. What have we learned on the way?
Programming in C is, after all, not so scary. You can understand the logic of what happens if you already know a (couple of) modern language(s). The code is not bloated and so many keywords sound familiar. And yet it took me (you would have coded it much faster, I know) much more time than I expected. There are two factors that determine it:
- C is a procedural language. Adjusting to a different paradigm is always a difficult task. Gladly, in the case of adjusting from the object to the procedural programming paradigm, it is not so difficult — you are missing something but you do not have to rethink everything.
- You have to take care of many more aspects like memory allocation and programming simple structures you take for granted. You have to know what you are doing and do it consistently.
Would I want to code in C more than I can do? Probably not in my everyday work, most probably not complex systems, For sure I wouldn’t choose it for a large mainstream greenfield project.
But just as you often feel the joy of doing things with your own hands, take pride in constructing something from scratch, it is surely fun to program a mathematical solution, react to sensor events almost as close to the “hardware” as it gets and to “squeeze” the last milliseconds while optimizing some time-critical process. And that’s where C still shines.
And because C is sometimes such a pain in the …, it gives you the challenge to write your program really well, even without mistakes, because afterward, they’re so hard to find. Who knows, maybe it could even make you a better programmer:)
For the broader perspective of the C language and its impact, I find the post titled “After All These Years, the World is Still Powered by C Programming ” by Daniel Munoz very informative.