What is a float , double , long double and a floating point literal in C ?
What is float, double, long double ?
To store numbers in a computer, an algorithm must be used. The C standard does not specify the algorithm, or the encoding to be used , for storing any kind of numbers, be it rational such as 1/2
, integer such as 5
or irrational such as pi
.
It only specifies the name of the numerical data types, such as int
, or float
. Their meaning, for example int
is used to store signed integer types , like -1
or 1
, and float
is used to store approximation of real numbers such as 1.2
or -12.5
. Their minimum range, for example the minimum range of the int
type is between -32767
and +32767
. The algorithms to encode numbers are specified by computer manufacturer.
The real types in C are the float
, double
, and long double
. The C standard defines the model of real numbers that must be encoded , this model is called the floating point model , and it has the following format :
Multiple algorithms exist for encoding floating points , the most commonly used one is the IEEE floating point format .
On computers , that uses the IEEE floating point format algorithm , the float
type maps to the IEEE single precision floating point , and the double
type maps to the IEEE double precision floating point. The long double
maps either to the IEEE quadruple precision floating point format, or the IEEE 80 bit floating point format.
The ranges of the C real types, when using the IEEE floating point format is as follow.
The float.h
header , contain information related to floating point implementations, such as the absolute value of the range [min, max] for each of the floating types , and the closest value to 0
.
#include<stdio.h>
#include<float.h>int main( void){
/*
print absolute value min,max range , each floating
type .*/
printf( "float absolute value of range : %e\n", FLT_MAX);
printf( "double absolute value of range : %e\n", DBL_MAX);
printf( "long double absolute value of range : %Le\n", LDBL_MAX);/* print closest absolute value to 0 , for each
of the floating types .*/
printf( "closest to 0 absolute value , float : %e\n", FLT_MIN);
printf( "closest to 0 absolute value , double : %e\n", DBL_MIN);
printf( "closest to 0 absolute value , long double : %Le\n", LDBL_MIN);}/* Output :
float absolute value of range : 3.402823e+38
double absolute value of range : 1.797693e+308
long double absolute value of range : 1.189731e+4932closest to 0 absolute value , float : 1.175494e-38
closest to 0 absolute value , double : 2.225074e-308
closest to 0 absolute value , long double : 3.362103e-4932 */
The type in which floating point arithmetic operations are performed , is defined in the macro FLT_EVAL_METHOD
, defined in the header float.h
.
If FLT_EVAL_METHOD
value is set to 2
, then arithmetic operations are performed by promoting the operands to the long double
type . If FLT_EVAL_METHOD
is set to 1
, then arithmetic operations are performed by promoting the operands to long double
, if any operand is of the long double
type , otherwise operands are promoted to the double
type , even if both operands are of the float
type . If FLT_EVAL_METHOD
is set to 0
, then arithmetic operations are done in the type of the widest operand. If FLT_EVAL_METHOD
is set to -1
, then it is indeterminable.
#include<stdio.h>
#include<float.h>int main( void){
printf( "FLT_EVAL_METHOD : %d\n" , FLT_EVAL_METHOD);}/* Output :
FLT_EVAL_METHOD : 0 */
Floating point literal
A floating point literal in C , can be written in decimal , in one of the following format :
d+.d*
d*.d+
d+[.]ed+
where d
is any digit between 0-9
, +
means one or more, *
means zero or more , what is between[]
is optional , and e
is case insensitive, and means an exponent of the number 10. As an example :
double x;
x = 1. ;
x = .1 ;
x = 1.0;
x = 1e1; // 10.0
x = 1.E1; // 10.0
By default the type of a floating point literal in C, is the double
type , unless suffixed with f
, case insensitive, in this case it will be of the float
type, or suffixed with l
, case insensitive, in this case it will be of the long double
type. As an example :
float aFloat = 1.0f ;
double aDouble = 1.0 ;
long double alongDouble = 1.0L ;
A floating point literal, can also be written in hexadecimal notation
0xh+[.]h*Pd+
0xh*.h+Pd+
Where 0x
is case insensitive , and stands for hexadecimal, h
is an hexadecimal digit between 0-F
, +
means one or more, what is between []
is optional, *
means zero or more, and P
is case insensitive , and means 2
to the power p
, and d
is one or more digits between 0-9
. As an example :
double x ;
x = 0xfP0; // 15.0
x = 0Xf.P0; // 15.0
x = 0xf.0P0; // 15.0
x = 0X.1P0; // 1/16 = 0.062500
x = 0x.1p1; // (1/16) * 2 = 0.125000
As with decimal floating point constant , hexadecimal floating point constant has a default type of double
. To provide the hexadecimal floating point constant , a type of float
, use the suffix f
, case insensitive, and to give it the type of long double
, use the suffix l
, case insensitive. As an example :
float aFloat = 0x1P2f;// 4.0f
double aDouble = 0x.1p3 ;// 0.5
long double alongDouble = 0X.3p2L ; // 0.75L
Originally published at https://twiserandom.com on December 14, 2020.