Musings on sizeof

sizeof is a major cause of confusion for people coming to C/C++from other languages. I hope that this post can help some future traveller who is looking to understand what sizeof is and what sizeof does.

Lets start with a definition

sizeof is a prefix unary operator that returns the size of the type of the given variable relative to the size of a char

This definition is only one sentence, but there is a lot of information to grok. So lets walk through what this sentence means in detail.

sizeof is a prefix unary operator

This may come as a suprise, but sizeof is not a function it is a prefix unary operator.

A operator is a symbol that tells the compiler to perform a mathematical or logical function. The word prefix tells you that the operator is expected on the left of the variable and finally unary means that it operates on a single variable.

For reference lets look at some other operators in C.

  • ++a is a prefix unary operator,
  • a + b is a infix binary operator
  • a-- is a postfix unary operator.

sizeof being an operator and not a function is important because operators can do things that functions cannot. For example, being an operator allows us to evaluate sizeof(float). It also allows sizeof to be a compile time constant, meaning that it’s evaluation does not affect the program at runtime (see more info here).

You can see how sizeof evaluates below, the increment of a has no effect because the sizeof operation happens at compile time.

#include <stdio.h> 
int main() {     
int a = 0;
sizeof ++a;
printf("%d", a); //prints 0
return 0;
}

Weird right?

Another thing to notice is that the sizeof operator does not require parentheses. This is because it is a operator not a function.

Yet another thing to note is that sizeof has a high precedence so sizeof a + b != sizeof (a + b) .

relative to the size of a char

sizeof returns 1 for a char and a positive integer representing the number of bits in the specified type divided by the number of bits in a char for anything larger.

What the hell does that mean? It means that a variable with a type char has a size of 1 and a variable that is 2x the size of char has a size of 2. Still don’t get it? Well here is some code.

#include <stdio.h>
typedef struct b{
char d, e;
} b;
int main()
{
char a;
printf("%d %d\n", sizeof(a), sizeof(b)); // prints 1 2
return 0;
}

This makes sense right? A struct that contains 2 chars is necessarily 2 times larger than a char.

But how big is a char? According to the C Specification ISO/IEC 9899:TC3 Chapter 5.2.4.2.1 (the first one I could find on google) it is at least 1 byte (8 bits). The size of every other standard type (float, double, int, etc..) is also architecture dependant. Some architectures define int’s as 2 bytes and others define them as 4 bytes.

But this is obvious right? I mean if sizeof always returned the same value for ints and floats and what-not, then there wouldn’t be a reason to use it. You could just define constants for these sizes and remove the sizeof operator entirely. The whole reason we need sizeof is to determine the size of variable in a portable way!

Alright well now that you are an expert on the sizeof operator’s relation to a char, lets get to the part that people always find confusing.

returns the size of the type of the given variable

Read that again, the size of the type of the given variable. That is all that sizeof does! But many people don’t realise what the type of the variable is when they are cramming it into sizeof and that is what causes confusion.

Consider the following (compiled and ran on tutorials point)

#include <stdio.h>
#include <stdlib.h>
int main()
{
int arr1[10];
printf("%d\n", sizeof arr1); // prints 40
printf("%d\n", sizeof arr1/sizeof arr1[0]); // prints 10
int* arr2 = (int *) malloc((sizeof(int)) * 10);
printf("%d\n", sizeof arr2); // prints 8
// yeah this is a memory leak, what are YOU going to do about it
return 0;
}

What? Why the hell is the size of arr2 8? That doesn’t make any sense, isn’t arr2 an array of 10 ints?

Nope, the compiler doesn’t see things that happen at runtime. What the compiler sees is that you are trying to get the sizeof a type int* and it correctly returns that the size of int* is 64bits (8 bytes). Unlike other languages, C/C++ allocates memory then gives you a pointer to a raw chunk of memory, there is no meta data stored during the allocation. So if you do sizeof(arr) where arr is a pointer to some memory, the application has no idea how big that array actually is.

Well whats going on with the other sizeof? Why is the sizeof that array 40? Well, ints on this system are 4 bytes and there are 10 ints (4 * 10 = 40). This is also why the next line sizeof arr1/sizeof arr1[0] prints the correct size of the array. Dividing the total size of the array by the size of one of its elements will give you the length of a fixed length array!

So here is the takeaway, since the space needed for a fixed size array (like arr1) is calculated at compile time, the compiler knows about it and by proxysizeof knows about it. All that the compiler knows about arrays allocated at runtime (likearr2) is that it is a int*, which in this case is 8bytes.

Compile time vs Runtime

Lets really drill home that last point, the part about compile time vs runtime.

When something is evaluated at compile time it has no runtime overhead. That’s why sizeof ++a above didn’t actually increment a when we ran it.

And, since sizeof is evaluated at compile time, it can only use information that the compiler is able to access. So, since the input to malloc can be variable at runtime, sizeof cannot use this information at compile time to determine the size of the array.

When I look at a sizeof operation, I think what does the compiler know about this variable and that usually tells me what sizeof will return.

I Have Mislead You (On Purpose)

There is one more weird quirk about sizeof in C99 and onward.

#include <stdio.h>
#include <stdlib.h>
void foo(int n){
char str[n]; // variable length array
printf("%d\n", sizeof(str));
}
int main()
{
foo(4); // prints 4
foo(15); // prints 15
return 0;
}

What? How does that work? Isn’t the size of that array determined at runtime? Didn’t you literally just say that this is not possible?

The answer is yes, and that means that you have caught me in a web of my own lies, I understand if you don’t want to come to my birthday anymore.

So here is the situation, C99 introduced Variable Length Arrays (VLA’s) which are a variable sized arrays allocated on the stack (I wont go into the difference between the stack and the heap here).

When given a VLA, sizeof waits until runtime to compute the sizeof the variable. This is the only time that sizeof will operate at runtime instead of compile time.

You should also know that in C99 this behaviour is well defined and is part of the standard, but in C11 and in C++ this behaviour is optionally implemented so you may not get the correct value out of sizeof when you compile the above code, you will have to check and make sure.

The End

Thats it! Thanks for reading. I hope this was able to clear up some of the weirdness about sizeof for you.

Extra fun (optional reading)

A little weirdness I came across while researching this (run it here).

#include <stdio.h>
#include <stdlib.h>
int main()
{
printf("%d\n", sizeof 'a'); //on some architectures prints 4
return 0;
}

This is because ‘a’ is actually a integer constant not a char by default. Read more here.