Data types on blockchain

Joowon
hdac_rizon
Published in
5 min readMar 20, 2020

Blockchain is a collection of various advanced technologies, not just a single technology. Data types are one of the areas in blockchain research. Researchers are conducting studies to create a data type which can operate accurate computation with less memory capacity, going beyond just using data types defined as a standard. In this post, I would like to explore considerations to define data types based on units of cryptocurrency, the most actively used service on blockchain platforms.

Representation of decimals

Blockchain platforms use real numbers including decimals as a unit of cryptocurrency. But they do not use data types such as float and double built in programing languages. It is because those data types use floating points. Floating points bound to have errors as they represent real numbers as an approximation without fixing decimal places on computer.

> 0.1 + 0.1 + 0.1
0.30000000000000004

As the example above, three 0.1s make 0.3, but the computer drew a different output. That is why an equal sign cannot be used for floating points. Machine epsilon, the upper limit of errors created from rounding numbers in floating number calculation, is used to resolve this issue. When comparing floating points, results are drawn by comparing the difference between a comparison value and an output with machine epsilon. But this computation process may cause a problem on blockchain platforms that must guarantee data integrity. Each OS and compiler have different decimal places and as data accumulate, differences bigger than machine epsilon occur and they end up representing different numbers.

/* Windows node v10.19.0 */> 1 - Math.pow(0.5, 22711.847256807654/(15000*184555992.82315296))5.686676352034681e-9/* MacOS node v10.19.0 */> 1 - Math.pow(0.5, 22711.847256807654/(15000*184555992.82315296))5.686676241012378e-9

Let’s review a project which utilized fixed points as an integer data type to resolve this. It used a 64bit integer data type as a cryptocurrency unit. The first 32bit is a whole number and the second 32bit represents decimals. 32bit can represent nine decimal places as the largest number it can represent is 4,294,967,295. When 1.000000023 is represented with this cryptocurrency, the integer is 1 which becomes [0, 0, 0, 1] when converted into bytes, and the decimals become [0, 0, 0, 23]. This can be saved as [0, 0, 0, 1, 0, 0, 0, 23] which is 1.000000023 represented by 4294967319. But the data type gets the maximum in this case, and an incorrect output is drawn as an overflow occurs when a number bigger than the maximum is used.

Operator overflow

An overflow is an unexpected output drawn when the result of calculation goes beyond the range of manipulation. For instance, if a data type can represent from 0 to 100, adding 1 to 100 would generate a result of 0.

A scenario below shows cryptocurrency manipulation using this overflow. Let’s suppose the cryptocurrency uses a data type that can represent 0 to 100. A, who has 10 coins, sends 90 coins to B and pays a fee of 11 coins.

In this case, blockchain’s consensus algorithm checks whether the amount that A is sending combined with the fee is smaller than the amount that A owns. The remittance (90 coins) and the fee (11 coins) make 101 in total, but the data type can represent up to 100, which is why 101 becomes 0. The transaction is valid as the sum of the remittance and the fee is smaller than the cryptocurrency that A owns (10 coins). When the transaction is processed, A still owns 10 coins which is the amount that A owns (10 coins) subtracted by the remittance and the fee (0 coin), B receives 90 coins from A and the validator of the transaction takes 11 coins for a fee. This means 101 coins are additionally created, which is an attack scenario with which A can manipulate B’s cryptocurrency.

Some projects believe that an overflow would not have occurred if they used a 64bit data type because it would allow them to use up to 18,446,744,073,709,551,615 and the value would not exceed their cryptocurrency supply. But an overflow attack is still possible in this case. To prevent it, a logic that checks whether there is an overflow by comparing the result of calculation with the initial value is added to the calculation process. In the example above, 90 plus 11 becomes 0, and the logic renders the transaction invalid as 0 (result of calculation) is smaller than 90 and 11 (initial values) and an overflow occurred.

Capacity

Some projects use a string or hex string value as a data type for cryptocurrency to use accurate values and avoid operator overflow attacks. This allows them to avoid errors such as overflows and floating points but takes up a lot of memory capacity. The capacity taken up by 100 million when it is represented by each data type is as below.

String and hex string data types require more capacity than integer types when a number becomes bigger. Moreover, the length changes based on the quantity of cryptocurrency, which is why bytes should be added as below to represent lengths.

By applying this, integer types including lengths can be used.

Even though it is 1 byte bigger than the 32bit integer type, it can represent values smaller than 4 bytes with less capacity. For instance, the capacity needed to represent 12 is as below.

There are various methods to represent units of cryptocurrency, and each has its own strengths and weaknesses.

Conclusion

Some may think data types are only a part of blockchain as it is a collection of various advanced technologies. But small mistakes in handling data types could lead to an unexpected result. This can lead to a collapse of a platform as blockchain is decentralized, and thus administrators cannot install a patch and modify the system. That is why developers should use an optimized data type by considering characteristics of each platform — whether to use a data type with big capacity for accurate data and give up the transmission speed or to process it using an integer type for faster calculation.

If you would like to discuss more about data types, blockchain platforms or other topics discussed by our team, you can do so on our forum. It is always open for everyone to participate. We are waiting for your valuable comments and feedbacks.

--

--