gotchacppCritical
What is the difference between float and double?
Viewed 0 times
doubleandbetweenthedifferencefloatwhat
Problem
I've read about the difference between double precision and single precision. However, in most cases,
float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are floats and doubles interchangeable? What are the differences between them?Solution
Huge difference.
As the name implies, a
Here's how the number of digits are calculated:
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
while
Also, the maximum value of float is about
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even
Furthermore, don't use
[1]: The C and C++ standards do not specify the representation of
As the name implies, a
double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.Here's how the number of digits are calculated:
double has 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digitsfloat has 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digitsThis precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023while
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996Also, the maximum value of float is about
3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even
double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.Furthermore, don't use
+= to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.[1]: The C and C++ standards do not specify the representation of
float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).Code Snippets
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996Context
Stack Overflow Q#2386772, score: 650
Revisions (0)
No revisions yet.