Navigating the World of Distributed C++ Code
Spend less time worrying about how to download, compile, and link external libraries
In my last article, I discussed some difficulties posed by trying to use third-party libraries within our own C++ programs. Many of these difficulties are due to a lack of understanding of some simple concepts related to C++ development.
I started by looking at each of the stages involved in compiling a C++ program and showing how to compile a C++ program using the MSVC toolset.
In this article, I want to look at navigating the world of distributed C++ code.
A common way of distributing code on the internet is to use a library. A library can come in many forms: header-only, static, or dynamic and can come pre-compiled or not.
A common method for sharing a library is to use a tarball or TAR file, which might not be a file format you recognize. As you can see, there are many ways to distribute C++ code and this variation can make using libraries in your games difficult.
Which is why I hope to explain all the distribution methods described above so you can spend your time focusing on making games and less time worrying about how to download, compile, and link external libraries.
Before We Get Started
To get the best out of this article, I would recommend you don’t start reading it until you are at least familiar with how C++ programs are compiled.
This article is aimed at people who are struggling with using third-party libraries within their own code or are just interested in the ways that C++ is built and distributed on the internet.
What Are TAR Files?
A TAR file or tarball is a file format that stores a collection of files within itself, treating these files as a single file, which makes it easier to share as only the one file requires downloading.
It can seem a little confusing as it looks like we are compressing a file but there is a difference between creating a collection (archiving) and compressing (reducing the size of) files.
After creating a TAR file, it is possible to then compress this single file using a tool such as gzip, which produces a .tar.gz
file. This is an archived collection of files stored in a .tar
file that has been compressing using gzip.
Note: I mention gzip as you’ll often see libraries or source code distributed as .tar.gz
files.
A TAR file is just another way of sharing files and, when compressed, it is similar to sharing a .zip
file, although there are reasons for choosing one over the other which I won’t go into here.
All you need to know is that code can be shared as a .tar.gz
file, which is a compressed collection of files.
C++ Libraries
The term library describes a collection of code. The main purpose of a library is to reuse commonly used code across multiple projects.
For example, we might have a maths library that has classes and functions for helping us solve problems with matrices and vectors that would be useful in any game project.
There are three types of library: header-only, static, and dynamic, and you can find more detail about the differences in this article here.
All that concerns us for linking and compiling libraries is being able to identify the difference between a static and dynamic library which can be done based on their extension.
A static library on windows has the .lib
extension and a dynamic library .dll
.
When including a library into our project, we need to provide a path for both the library file itself and the header file associated with that file. The header files expose the functionality of the library to our program.
Finally, a header-only library, such as GLM which is a maths library, is a library that doesn’t need compiling. Instead, the library is compiled along with our own, which often increases compilation times and code bloat as described by this Wikipedia article.
An advantage is that they are much easier to use as we only need to point our program at the location of the header files, as opposed to having to build and link a library which is often the case.
Pre-Compiled vs. Not Pre-Compiled
There is a possibility that our library will come as a pre-compiled package for the operating system we are using.
Pre-compiling a library means that it will only work on a certain platform, say Windows, and we just need to include the location of the library file (.lib or .dll) and associated header files.
As discussed in the last article, a compiled library is one in which the source code for that library is built and linked to produce a file containing the machine code for a specified platform.
A problem with pre-compiled libraries is that we can become stuck with code that doesn’t work on a platform we want to develop on.
Therefore, having the source code and the ability to compile the code ourselves can be beneficial but it adds additional work before we can start using the library.
There are a variety of ways to compile source code but it’s common to see code bundled with a Makefile which is used alongside a tool called Make.
Understanding Make and Makefiles
To understand the reasons behind Make we need to understand the problems that Make and Makefiles solve.
If you look back at the last article, where we learned how to compile a simple C++ program using the MSVC toolset, you may have noticed that it was not difficult nor time-consuming to build our program.
Unfortunately, as programs increase in size and complexity, containing 1000's of files, and linking to many external libraries, the complexity and compilation time of our program will increase.
One way to mitigate the time it takes our program to compile is to not compile code that hasn’t changed. Code that hasn’t changed won’t need compiling because we will already have the object files for those source files from a previous build.
However, if one of our code files has changed, that file and any file that depends on it must also be re-compiled. This can definitely get complicated if we have a large codebase and a particular file, like a vector class, ends up getting used in a lot of files.
Therefore, we not only have to compile a program for a large codebase but we must now also keep track of all the dependencies on any files we have changed if we want our program to compile as quick as possible.
As you might have guessed by now, Make and Makefiles solve the problems described.
Make is a build generation tool that uses a Makefile to compile and link source code to produce an executable or library. A Makefile is a file containing instructions that tells Make how we want our code to be built.
Why Make Is Used
As you can see, Makefiles make it easier for developers to build and re-compile their code with minimum effort and improved compilation times. It is for this reason that many developers distribute their code with an included Makefile.
This way, we need not understand how the code compiles and links. All we have to do instead is call the Make command on the specified Makefile and let Make do the rest.
Once we compile the code, it still needs to link to our project by linking the libraries and header files.
If you sat there wondering why you can’t just distribute your Visual Studio project on the internet and have anyone who wants to use it load up the project and hit build…
Well, it’s because not everyone uses Visual Studio. People like to use different IDEs and outside of Windows, Visual Studio doesn’t exist. Well, it now exists on macOS but doesn’t support C++.
Visual Studio projects and projects based on other IDEs can become outdated as new versions of the software are released. For example, your Visual Studio 2013 project might not even open in Visual Studio 2017.
C++ code itself can become “outdated”, in that the language evolves as new standards are released but that doesn’t stop a C++ compiler from understanding your code even if it has to compile code written in an older version of the language.
Therefore, distributing C++ programs as source code with Makefiles not only makes it easier for the user to build the code but it also stops users from needing to run the code in a specific IDE such as Visual Studio.
An Example Using Make
In the example from the previous article we used the MSVC toolset from the command line to build our program.
We now know that this is not ideal as it requires us to track all changed files if we want to avoid recompiling every file in our codebase, regardless of whether it has changed.
Instead, let’s look at how we can use NMAKE (Windows’ equivalent to Make) and the Developer Command Prompt to build our program based on a Makefile.
main.obj: main.cpp HelloWorld.h
cl -c main.cpphelloworld.obj: HelloWorld.cpp HelloWorld.h
cl -c HelloWorld.cpptest.exe: main.obj HelloWorld.obj
link /out:test.exe main.obj HelloWorld.obj
The above code shows a simple Makefile. The Makefile works by creating three description blocks, each consisting of:
target : dependents
Commands
Target
is just a file or set of commands, in our case, files.
Dependents
refer to the files the target files depend on to be created. For example, an object file depends on a .cpp
and the header files that file includes.
Finally, commands
refer to commands we want to execute for the target file, in our case, we are trying to compile a C++ program so we want to compile to produce object files and then link those object files to produce an executable file using Microsoft’s C++ compiler.
In the Makefile example above, the top description block specifies the creation of the target file main.obj
which relies on main.cpp
and HelloWorld.h
, in which the cl -c
command is run to compile the object file without linking.
The second description block specifies the creation of target file helloworld.obj
which relies on HelloWorld.cpp
and HelloWorld.h
, in which the cl -c
command is run to compile the object file without linking.
Finally, the third description block specifies the target file, an executable test.exe
, which relies on main.obj
and helloworld.obj
, in which the link /out
command is used to link the object files to produce the executable, whilst also allowing us to specify the name of the resulting executable with /out
.
As you can see, no commands are executed in the Developer Command Prompt, other than to call NMAKE.
Instead, the commands we would normally use are embedded within the Makefile.
We specified the files we want to create and the dependencies of those files so, if we try to run NMAKE again without changing our code, NMAKE won’t try and recompile our code because there is no need.
A Quick Word on CMake
One last thing I want to discuss in this article is the software tool CMake because you are likely to come across it at some point or you might already have come across it.
CMake stands for cross-platform Make and, given its name, you might think it’s some kind of extension to Make but it’s a little more complicated than that.
CMake, like Make, can build executables or libraries from source code with the help of an instruction file. For CMake, this is a CMakeList
.
Unlike Make though, CMake isn’t just a build-automation tool. CMake can generate files for building natively on different platforms. It can produce and build executables from Makefiles with Make, or generate files for other build systems like Ninja.
It can also create native Visual Studio or Xcode projects on Windows and macOS. It can build programs in a compiler-independent manner.
Essentially, CMake gives you the freedom to write code and have it built and run across a variety of different platforms.
CMake and similar tools are often a better choice to learn rather than just learning a tool like Make, because of the additional benefits you get with the tool, such as being able to generate code that can be built on multiple platforms.
Summary
Summarizing, we have looked at navigating the world of distributed C++ code over the internet. Developers often share code as a library that can be header-only, static, or dynamic, often shared as a compressed archive of files known as a TAR file or tarball.
Libraries can come pre-compiled for the platform we are working on, so we only need to tell our program the location of the library and associated header file to get them working in our code.
Finally, if libraries aren’t pre-compiled, we can compile them with the source code and an instruction file like a Makefile.
Sometimes, source code is shared with a CMake list which works with a tool called CMake, allowing us to generate build files like Makefiles and native projects like Visual Studio.
References
- https://www.howtogeek.com/362203/what-is-a-tar.gz-file-and-how-do-i-open-it/
- https://www.toptal.com/c-plus-plus/c-plus-plus-understanding-compilation
- https://docs.microsoft.com/en-us/cpp/preprocessor/preprocessor-directives?view=vs-2019
- https://www.cs.bu.edu/teaching/cpp/writing-makefiles/
- https://www.quora.com/What-is-the-difference-between-CMake-and-make