Photo by Nana Smirnova on Unsplash

Concepts 1 — Definitions, Declarations and Uses

Dr. Timm Felden

--

All popular programming languages have means to define variables, functions or data structures to reuse them over and over. These are the basic building blocks of programs.

Unfortunately, the distinction between definitions and declarations is often insufficient or incorrect leading to a lack of clarity. Also, scripting languages commonly lack a clear distinction between definition and uses leading to lots of issues in programs larger than what fits on a single screen. This lack of clarity in terminology usually originates from the language definitions and core documentations and trickles down the way to tool implementations and how most language users talk about code.

Making errors in using these concepts can result in hard-to-understand bugs. It, also, limits the level of quality development tools can reach. But, let’s have a look at the definitions of these concepts.

Definition

A definition is the entire description of properties, capabilities and the behavior of the underlying entity. Most programming languages allow an entity to be defined exactly once. Also, most programming languages require explicit definitions of entities — sometimes with very brief syntax. Definitions are mostly used by writing their name into the code. Thus, the definition usually contains that name along with other properties like types, or relations to other entities like functions or fields associated with a type definition. Because definitions define an entity, IDEs usually offer us means to show them or navigate to them.

Examples of definitions are implementations of functions. Implementations of classes in Java are also definitions.

Fields in Java are always definitions. Fields in C or C++ are a grey area that we will explain later. Nonetheless, they are definitions in most use cases.

Local variables are always definitions. Function and type parameters are almost always definitions. For Java, they are always definitions.

Literals, like integer literals, like 7, and string literals, like “hello”, are no definitions and do not even interact with definitions. Literals “just are”. It does not make sense to provide a definition for something that is just a piece of data meant to be used as is. The same is not true for constants giving literal value a name and a type and, hopefully, a documented semantics.

Declaration

A declaration declares that an entity under a given name exists. Sometimes, declarations contain some structural information like a type. This makes the distinction from definitions confusing even for programming language designers. The purpose of having declarations at all is to allow uses on entities that are not, cannot or will not be defined before the use is evaluated. A definition is always also a declaration in the same way as any String in Java is an Object. Thus, if using an entity requires a declaration and a language has syntax for declarations and definitions, no explicit declaration is required if a definition is provided already.

An example are function declarations in C or C++. These declarations contain function names and a function type given as a list of parameter types and a return type. The purpose of function declarations is to allow functions to mutually use each other. This is required in C as the compiler processes source code from top to bottom. As such, when it sees the first implementation, it might know that function, but it cannot know the other function. Hence, it requires a declaration in that situation.

Interface definitions like interfaces in Java or Go contain only definitions. It is a common mistake to think of them as declarations, just because there is no actual implementation involved. Nonetheless, they are definitions as they provide everything that defines an interfaces or the functions that the interface offers.

In modern languages, declarations are usually derived automatically by the compiler and hidden from the user. In Java, for instance, the compiler uses JARs during compilation to add required declarations to the class files.

An implicit declaration of java.lang.IllegalArgumentException in Java Bytecode

Declarations are promises to the compiler that there will eventually be a definition that matches a declaration. If that promise is broken, bad things can happen. An obvious issue is undefined symbols during linking. Or, class loader exceptions in JVM-based languages. Less obvious issues are type mismatches. On the bright side, declarations allow decomposition of code and, to some extend, updating and maintaining code individually. There is a reason why we wouldn’t link drivers into end user products like we did in the era of DOS.

Use

Putting the name of a definition or declaration into the code usually results in a use. A use of an entity associated with storage means loading data from that storage. A use of an entity associated with code usually means executing that code. In some languages, entities can have multiple meanings. For instance a function in C can mean the code given in its definition if used in a function call. Or it can mean the position of that code in memory, if used without a call.

Handling uses is mostly not an issue for programmers. It can be tricky for tool providers, however. Especially, if there are multiple declarations, but no definition or, if the language allows it for some reason, multiple definitions.

A VS Code plugin whose authors failed implementing the use and definition concepts

Errors, Cycles and ODR (One Definition Rule)

If we write down an inadequate use of an entity, we want the compiler to provide us with a good error telling us where to look. For instance, if there are multiple visible declarations under the name we just wanted to use, the compiler should tell us why it couldn’t pick one and where the options come from. However, in some languages it is allowed to have multiple declarations and even multiple definitions for the same entity.

The term one definition rule is mostly known in C++. It even has its own page on Wikipedia. The reason why it’s called a rule is because in C++ and in most other languages, it is something that the programmer is expected to ensure. And violating it has severe consequences leading to hard-to-diagnose errors which most of the time feel like compiler bugs as the code that is apparently executed does not match what you see in the IDE in front of you. What happens here is that, for whatever technical reason, an entity can or must be defined at least twice. Later during compilation, some piece of the compiler will discard a random definition, because, fundamentally, only one definition can ever exist. Please do not confuse this with multiple different and separate definitions existing under the same name. Here, it is really about the same definition getting folded into one without a check that all are equal. Obviously, such a behavior isn’t an issue if all definitions really are equal. But it is once they are not.

Now, if you think that this is some odd C++ issue — it is not. This concept also extends to classes in Java. Before Java 9, JARs were really just zip files containing class files. Thus, especially in larger projects, multiple JARs can offer the same class. Now, if you update one dependency, but the class loader at runtime uses that dependency from another JAR that you did not update, you’ll have fun. Even more so, if some classes get loaded from the new JAR and some from the old JAR. This can really happen — I’ve been there. Since Java 9, a JAR can offer the same class in multiple versions. Let’s just hope none of us will run into the effects of badly designed build pipelines. Sadly, even that’s something I experienced already.

To some extent, ODR violations affect values, too. Some languages offer initialization order checks for constant definitions. Java, however, uses top-down initialization of constants, i.e. final static fields, that is interrupted by on-demand processing of other classes. While this simplifies the JVM implementation tremendously, it also means that the constant is defined twice. Once with a default value and once with the value computed by its initializer and both values can be observed. This is, however, usually not associated with the term ODR when talking about it.

Namespace Definitions

OK, so the last one is pedantic and mainly targeting tool builders. In most modern languages, namespaces (aka packages) are created by the folder layout and names of source folders. As such, the folders themselves are the definitions of the package. package x.y.z clauses at the beginning of a file are redeclarations with the sole purpose of ensuring that the file is still at the correct place. Because moving a file usually requires updating some of its contents like imports. Namespaces are usually the only entity that, depending on the perspective of what a namespace really is, has multiple definitions or distributed partial definitions. In practice, this is usually not relevant as namespaces are not compiled to entities that exist at runtime.

--

--

Dr. Timm Felden

Programming language enthusiast for decades. Author of Tyr. Writes about types and programming languages.