# What is float, double, long double ?

To store numbers in a computer, an algorithm must be used. The ** C standard does not specify** the algorithm, or the encoding to be used , for storing any kind of numbers, be it rational such as

`1/2`

, integer such as `5`

or irrational such as `pi`

.** It only specifies** the name of the numerical data types, such as

`int`

, or `float`

. Their meaning, for example `int`

is used to store signed integer types , like `-1`

or `1`

, and `float`

is used to store approximation of real numbers such as `1.2`

or `-12.5`

. Their minimum range, for example the minimum range of the `int`

type is between `-32767`

and` +32767`

. The algorithms to encode numbers are specified by computer manufacturer.The ** real types in C** are the

`float`

, `double`

, and `long double`

. The C standard defines the model of real numbers that must be encoded , this model is called the floating point model , and it has the following format :Multiple algorithms exist for encoding floating points , the most commonly used one is the IEEE floating point format .

On computers , ** that uses the IEEE** floating point format algorithm , the

`float`

type maps to the IEEE single precision floating point , and the `double`

type maps to the IEEE double precision floating point. The `long double`

maps either to the IEEE quadruple precision floating point format, or the IEEE 80 bit floating point format.The *ranges of** *the C real types, when using the IEEE floating point format is as follow.

The `float.h`

header , ** contain information related to floating point implementations**, such as the absolute value of the range [min, max] for each of the floating types , and the closest value to

`0`

.#include<stdio.h>

#include<float.h>int main( void){

/*

print absolute value min,max range , each floating

type .*/

printf( "float absolute value of range : %e\n", FLT_MAX);

printf( "double absolute value of range : %e\n", DBL_MAX);

printf( "long double absolute value of range : %Le\n", LDBL_MAX);/* print closest absolute value to 0 , for each

of the floating types .*/

printf( "closest to 0 absolute value , float : %e\n", FLT_MIN);

printf( "closest to 0 absolute value , double : %e\n", DBL_MIN);

printf( "closest to 0 absolute value , long double : %Le\n", LDBL_MIN);}/* Output :

float absolute value of range : 3.402823e+38

double absolute value of range : 1.797693e+308

long double absolute value of range : 1.189731e+4932closest to 0 absolute value , float : 1.175494e-38

closest to 0 absolute value , double : 2.225074e-308

closest to 0 absolute value , long double : 3.362103e-4932 */

The ** type in which floating point arithmetic operations** are performed , is defined in the macro

`FLT_EVAL_METHOD`

, defined in the header `float.h`

.If `FLT_EVAL_METHOD`

value is set to `2`

, then arithmetic operations are performed by promoting the operands to the `long double`

type . If `FLT_EVAL_METHOD`

is set to `1`

, then arithmetic operations are performed by promoting the operands to `long double`

, if any operand is of the `long double`

type , otherwise operands are promoted to the `double`

type , even if both operands are of the `float`

type . If `FLT_EVAL_METHOD`

is set to `0`

, then arithmetic operations are done in the type of the widest operand. If `FLT_EVAL_METHOD`

is set to `-1`

, then it is indeterminable.

#include<stdio.h>

#include<float.h>int main( void){

printf( "FLT_EVAL_METHOD : %d\n" , FLT_EVAL_METHOD);}/* Output :

FLT_EVAL_METHOD : 0 */

# Floating point literal

A floating point literal in C , ** can be written in decimal **, in one of the following format :

`d+.d*`

d*.d+

d+[.]ed+

where `d`

is any digit between `0-9`

, `+`

means one or more, `*`

means zero or more , what is between`[]`

is optional , and `e`

is case insensitive, and means an exponent of the number 10. As an example :

`double x; `

x = 1. ;

x = .1 ;

x = 1.0;

x = 1e1; // 10.0

x = 1.E1; // 10.0

By default the ** type of a floating point literal** in C, is the

`double`

type , unless suffixed with `f`

, case insensitive, in this case it will be of the `float`

type, or suffixed with `l`

, case insensitive, in this case it will be of the `long double`

type. As an example :`float aFloat = 1.0f ;`

double aDouble = 1.0 ;

long double alongDouble = 1.0L ;

A floating point literal, can also be written *in hexadecimal notation*

`0xh+[.]h*Pd+`

0xh*.h+Pd+

Where `0x`

is case insensitive , and stands for hexadecimal, `h`

is an hexadecimal digit between `0-F`

, `+`

means one or more, what is between `[]`

is optional, `*`

means zero or more, and `P`

is case insensitive , and means `2`

to the power `p`

, and `d`

is one or more digits between `0-9`

. As an example :

`double x ;`

x = 0xfP0; // 15.0

x = 0Xf.P0; // 15.0

x = 0xf.0P0; // 15.0

x = 0X.1P0; // 1/16 = 0.062500

x = 0x.1p1; // (1/16) * 2 = 0.125000

As with decimal floating point constant , hexadecimal floating point constant *has a default type **of* `double`

. To provide the hexadecimal floating point constant , a type of `float`

, use the suffix `f`

, case insensitive, and to give it the type of `long double`

, use the suffix `l`

, case insensitive. As an example :

`float aFloat = 0x1P2f;// 4.0f`

double aDouble = 0x.1p3 ;// 0.5

long double alongDouble = 0X.3p2L ; // 0.75L

*Originally published at **https://twiserandom.com** on December 14, 2020.*