What is a .dll?

Arun Ramachandran
Delta Force
Published in
8 min readAug 27, 2017

Ever had an error due to missing .dll files? Ever wondered what those pesky little buggers are? Well, read on to find out how they work!

Overview

A .dll, or a dynamically-linked library, is a software library that’s imported into an executable when it is run. What the hell does that mean? Well, that’ll require delving into a scary piece of code called…

The C compiler

C/C++ compilers on Windows are capable of producing these dynamically-linked libraries. To truly understand how they work, how a compiled language can import code at runtime, we’ll have to have a basic understanding of a compiler.

A C compiler flow can be broken down thusly:

Compiler steps

To understand how a dynamically-linked library works, we need to understand what a library is, which means we’ll need to understand what a linker is, which means we’ll need to understand what a compiler is…

So let’s get started.

Preprocessor

The preprocessor is pretty simple: it copies and pastes code, and can conditionally include certain parts of code depending on the compile-time environment.

Developers can give orders to the preprocessor using pre-processor directives. In the above example, #include simply asks the preprocessor to paste the contents of the stdio.h standard library here.

Compiler

Ah, the compiler… What a simultaneously glorious and terrifying piece of software.

The compiler is what converts our high-level code into executable binaries. Wait, aren’t we discussing the parts of a compiler? How does a compiler contain a compiler?

Compilers like gcc or the one packaged with Visual Studio (called MSVC) contain many components — including a bit that does the actual translation from source code to machine code. Machine code is binary, all 1s and 0s. This is what ultimately runs on your computer and what it understands.

To avoid any further confusion, we’ll refer to the bit that does the translation as a translator, and a compiler will refer to all 3 parts — preprocessor, translator and linker.

Let’s now compile a simple example:

  1. The preprocessor pastes stdio.h and hands it to the translator.
  2. The translator converts the code to machine code and produces an executable. An executable is just a .exe file.
  3. We run the executable and we see ‘9’ on the screen.

Simple stuff, right? ‘But hold on,’ I hear you say. ‘What about the linker?’. Well, the linker actually doesn’t do a whole lot here. The linker is used when we have multiple files to compile (It’s actually used to resolve references, but we’ll get to that later). So why bother with multiple files? Why can’t we just stick with this? What’s the purpose of splitting our code into multiple files?

  1. To modularize code and reason about it sanely.
  2. Some codebases are several million lines long, like Linux and gcc, and they take hours to compile from scratch. Splitting code into multiple files allows the compiler to compile only those files that were changed, or whose dependencies were changed.

Let’s return to our example. We can shift the function complex_function to its own file. Let's do that now.

We now pass both main.c and awesome_sauce.c to our compiler. The preprocessor does its work, but the translator will throw an error (or at the very least a warning). Why? Well, we've not declared complex_function anywhere in main.c! Clearly, just separating functions like this isn't going to work. Instead, we do this:

What’s this .h file? It’s called a header file. A header file typically holds declarations. The .c files, or source files, hold the actual definitions. So what happens now?

  1. The preprocessor chugs along faithfully and pastes code in place of the #includes. We have to pass the location of awesome_sauce.h to the compiler so that the preprocessor knows where to find this non-standard header (A header that's not part of the standard library. stdio.h is a part of the standard library).
  2. The translator now processes each .c file individually and produces an object file for each .c file. That’s right, I lied earlier when I said the translator produces an executable. Each source file produces an object file. These object files are all given as input to the linker.
  3. The linker does magical things to the object files and makes an executable. More on this later. Our earlier example had only one file, but the linker still did some other stuff — linking bits of the standard library. We’ll get to what libraries are later.

Let’s go into a bit more detail. After step 1, main.c looks like this to the translator:

In step 2, while translating to object files, the translator converts variables and functions to symbols, which it stores in a symbol table. Thus, complex_function is converted to a symbol, and all occurrences of complex_function are replaced with this symbol. However, as was earlier mentioned, the translator goes through each source file individually, and thus while working on main.c, it does not know the actual location of complex_function. So, it replaces all occurrences of complex_function in main.c with a symbol, but the symbol table, which is supposed to contain a reference to the actual location of complex_function, is empty. We say complex_function is an unresolved symbol.

Step 3 is where symbols actually get resolved by the linker. How does that happen?

Linker

The linker is passed the object files with unresolved symbols (or unresolved references). Let’s call these files main.o and awesome_sauce.o. What happens now is actually pretty simple: the linker will take all the object files (so it takes main.o and awesome_sauce.o), searches for any unresolved references and resolves them. So, this is what roughly happens:

  1. The linker notices the unresolved reference to complex_function inside main.o.
  2. It notices the definition for complex_function inside awesome_sauce.o.
  3. It makes the reference in main.o point to the address of the definition of complex_function inside awesome_sauce.o. Pretty simple, right?

If we hadn’t passed awesome_sauce.o to the linker (by not compiling awesome_sauce.c at all), we would have gotten a linker error (not a compiler error) complaining about an unresolved reference.

This is an example of static linking. Static linking means performing linkage at compile time, i.e., right after the translation (static = at compile time, dynamic = at run time). Understanding static linkage is crucial to understanding dynamic linkage and how .dll files work.

Libraries

In the example we just ran through, we’d have to re-compile awesome_sauce.c every time we wanted to make changes to main.c, as the linking needs awesome_sauce.o. That seems stupid, surely we can just translate awesome_sauce.c to binary and just do the linking alone?

Apparently, compiler developers have already thought deeply about optimizations, and any superficial attempts on our part to contemplate on flaws in compilers isn’t likely to yield results. Surprising, isn’t it?

A library is exactly what we described, some binary code with a bunch of definitions for symbols. It’s already been translated and is now just waiting to be linked to an executable or another library. So, we can compile awesome_sauce.c on Windows to an awesome_sauce.lib. This can then just be passed to the compiler when compiling main.c, which gives the library directly to the linker, thus skipping the preprocessor and translator altogether (main.c still goes through all the steps though). Keep in mind that we can't run libraries like executables, as libraries don't contain a main function, they're just a bunch of definitions for symbols.

If it hasn’t occured to you yet, all of the standard libraries are just a bunch of libraries that have already been compiled to binary code. You don’t see stdio.c compiling everytime you include it, do you? No, of course you don't. That's because it's already been compiled and is usually distributed along with whatever C compiler you've installed. Standard library functions like printf are resolved by the linker by finding the definitions for them in their respective libraries.

Magical as static linking might seem, we can go a step further. What if we don’t even want to link during compile time? We definitely need some way to resolve unresolved references, but it turns out we can defer that for until after compilation…

Da-Da-Dynamic Linking!

Static linking has several inherent issues which we shall expose through an example.

I am the maintainer of a security library called h4x0r_pwner. Everyone in the world uses this library by statically linking with it. Today, a leet hacker has pwned my library and discovered an exploit. I immediately patch the exploit and push new binaries. Now, everyone depending on h4x0r_pwner must re-compile their code and re-distribute their executables. This sounds like a horribly painful process — forcing services that depend on your library to re-compile and re-distrubute their code. There is a simpler way — using dynamic linking.

Dynamic linking is exactly what it sounds like — running the linker at run-time instead of compile time. Now, references to symbols in h4x0r_pwner are resolved at run time, so services depending on my library just need to download the latest version of my library. They don’t need to recompile their code, as their executable will simply link to my new library at run time. This is much better than requiring clients to recompile their services.

Apart from easy updating, dynamic libraries also save space. Static linking just dumps the necessary symbol definitions into the final executable. Dynamic libraries contain the definitions themselves, reducing the executable size. If multiple executables require one dynamic library, the library is only loaded once into memory, thus saving memory too.

So how do dynamic libraries actually work? On Windows, you can just ask your compiler to produce a dynamic library instead of a static one — well, not quite, but that’s the essence of it. So let’s say we want to compile awesome_sauce.c to our very own .dll. We make the compiler generate a dynamic library, and it spits out two binaries - awesome_sauce.lib and awesome_sauce.dll. Great, we have our own .dll! But what's the static library awesome_sauce.lib doing here?

Well, the .lib is a stub. It just contains hooks into the .dll file.

The .lib file just calls stuff inside the .dll. Why do we even need it? Well, let’s say we want to compile main.c now, and wish to link with awesome_sauce.dll. We first link main.c with awesome_sauce.lib while compiling... and that's it! The function in the .dll file is searched for and found at run-time. This means we can change the definition of complex_function without recompiling main.c! Let's take an example: say we run main.exe now. We get an output of 9, as 4 + 5 is 9. Now, I change awesome_sauce.c to this:

Now, I compile this and I get awesome_sauce.dll and awesome_sauce.lib. I replace the old awesome_sauce.dll packaged with main.exe with the new one. Now, I just run main.exe and the output is now -1! We changed the definition of complex_function, but didn't trigger a recompilation of main.exe.

It’s not all rosy, though. Dynamic libraries come with their own set of disadvantages, one of which is something called binary compatibility.

Let’s say we want complex_function to take in 3 arguments now. Nope, that's not possible without recompiling main.c. Why though? Notice awesome_sauce.lib. It's already hard-coded to call complex_function with 2 arguments. We've already statically linked main.c with this library. If we want to add an argument, the .lib file will change. Thus, any change to the declaration of a function will result in something called a binary incompatibility. This is something library developers will have to take care of, otherwise they might inadvertently force a recompilation on services that depend on them.

Phew

Well, that’s about it. This was a pretty basic overview of how a .dll works on Windows. Dynamic libraries are present on other platforms too, though they work somewhat differently, e.g. Linux has ‘shared objects’ or .so files.

True understanding comes with doing. Compiling a dynamic library yourself with Visual Studio (or gcc on Linux) will make you feel like a true hacker. Here are some awesome_sauce resources for doing that:

--

--