C++3: Using Numbers (safely) in C++

nubb
6 min readJul 9, 2024

--

Welcome back, Medium. Last week, we got our first look at debugging C++ using Visual Studio Code’s debugger. This time around, we’re going to delve into C++’s main data types like ints, chars and doubles, alongside some explanation on the size of these data types and how computers suck at math.

Let’s get to it!

(source)

8-bit byte

I’d like to start off by saying a lot of what will be said from here on out contains a fair bit of assumptions regarding system architecture and other things beyond me. One of the first assumptions we’ll talk about relates to memory.

For starters, every data type has a certain limit to what it can hold, whether it be a number or character limit, we call this the range of the data type, and it is entirely dependent on how much memory it can allocate. For example, an int type can hold a max of 4 bytes of memory, where every byte contains 8 bits. Bits are binary digits, that are either 0s or 1s, which translates to a “string” of bits, which occupy a byte, which we can later turn into an actual number.

We can get this information by using C++’s sizeof() operator, which returns the number of bytes a type or variable holds.

Consider the following program:

#include <iostream>

int main()
{
int x { 5 };
int y { 1000 };

std::cout << "Size of x is " << sizeof(x) << " bytes\n";
std::cout << "Size of y is " << sizeof(y) << " bytes\n";

std::cout << "Size of int data type is " << sizeof(int) << " bytes\n";
return 0;
}
OUTPUT:
Size of x is 4 bytes
Size of y is 4 bytes
Size of int data type is 4 bytes

When encountering unknown types and needing to understand their memory footprint, sizeof() comes in handy for making the job easy for us.

Next up are unsigned and signed types. Integers by default are signed in C++, and all that means is that they can be negative, positive or 0. Unsigned integers can only hold a positive value or 0.

What this alters in the end is the range of the data type and how large of a number it can store. A signed 4-byte (32-bit) int can hold values between -2 147 483 648 → 2 147 483 647, or -(2^n-1) → (2^n-1) -1, where n = # of bits. An unsigned 4-byte int has a range of 0 → 4294967295, or 0 → (2^n)-1.

The reason why the range is important to understand is to prevent arithmetic overflow, which leads to undefined behaviour.

Consider the following example:

#include <iostream>

int main()
{
int y { 2147483647 }; // max range of a 4-byte int, totally legal

std::cout << "y has a value of: " << y << "\n";

y = y + 1; // we just exceeded the integer range, what's gonna happen?
std::cout << "y has a value of: " << y << "\n";

return 0;
}
OUTPUT
y has a value of: 2147483647
y has a value of: -2147483648

Huh? What just happened? Why did y turn into a negative number? The answer is wrapping.

All that happened here is that we took the amount we went past the range and added it to the original variable while inverting the sign of the variable. The same thing goes for unsigned variables.

Consider this minor alteration to our previous program:

#include <iostream>

int main()
{
unsigned int y { 4294967295 }; // now has a max range of 0 to 4294967295, this is still legal

std::cout << "y has a value of: " << y << "\n";

y = y + 1; // exceeded range limit!
std::cout << "y has a value of: " << y << "\n";

return 0;
}
OUTPUT
y has a value of: 4294967295
y has a value of: 0

There is a formula to represent what happened here, which is the following:

“Adjusted” value = Overflown value % (max range) + 1

In this case, our formula would look something like this:

4294967296 % 4294967296
= 0

That’s all the explanation I’ll give on integer overflow. Our final order of business is understand floating point numbers and chars.

Floats and doubles both have a size of 4 bytes and 8 bytes respectively. The size of both data types allows for greater/less precision with floating point values. A floating point means that the decimal point can be anywhere within the range the data type can hold. What gets weird is when we have a lot of decimals, which is where computers fail to do basic arithmetic.

To keep it short, due to the range limit imposed by these data types, if we exceed the range limit, our machine attempts to approximate any decimals that exceed the memory limit, which produces janky floating-point results, like the following:

// source: https://www.learncpp.com/cpp-tutorial/floating-point-numbers/

#include <iomanip> // for std::setprecision()
#include <iostream>

int main()
{
std::cout << std::setprecision(17);

double d1{ 1.0 };
std::cout << d1 << '\n';

double d2{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 }; // should equal 1.0
std::cout << d2 << '\n';

return 0;
}
OUTPUT
1
0.99999999999999989

I don’t think words are even needed to describe how goofy this is. Personally, my verdict to prevent this issue is to avoid using super high precision decimals, whether it be through std::setprecision() or when creating our floating point values.

Last but not least are chars. Chars have a memory limit of 1 byte and can store 1 character or symbol at a time. Multicharacter literals are chars that hold more than one character/symbol per variable. Both chars and multicharacter literals contain an identifier in the form of an integer value. Chars reference ASCII tables to set an integer value to the char variable that matches the given symbol/character.

// illustration of previous point

char hASCII { 104 }; // references letter 'h' on ASCII table, legal but not preffered (way too cryptic)
char hChar { 'h' }; // preffered and normal approach

There are 2 downsides between chars and multicharacter literals, those being the following:

  1. Chars only have 1 byte of memory available, so they have a limited amount of characters they can store.
  2. Multicharacter literals are compiler dependent, meaning they’ll exhibit undefined behaviour most likely. (see this page for some examples)

The first issue can be solved by using the char8/16/32_t data types, which were introduced in C++11 to allow for Unicode standard compatibility, which allowed programmers to use more unique characters while still using chars. The second issue is (to my knowledge) unsolvable. Multicharacter literals should be avoided since strings can do the same thing without making you scratch your head at why your compiler thinks 1 + 2 equals 312142.

Constexpr

Sorry about how messy this blog post is, I wanted to include as much of my learning as possible here while keeping it decently digestible for newbie programmers like myself.

Next week, we’ll look at constants (constant variables & expressions) and literals, which we’ve previously looked into but haven’t properly explained (see my first C++ blog post). I’ll try my best to explain everything a bit more concisely so that it makes sense since I feel as though I leaped through a bunch of topics too quickly this time around.

Anyway, I hope you guys have a great rest of your day.

See you later, Medium :)

If you enjoyed this blog post, consider following me or subscribing to support me!

Check out my GitHub and Portfolio!

--

--

nubb

Some guy programming and telling the world about it || Check out my projects here: https://github.com/nubbsterr