Is floating-point math broken? Ask Question

Question

Binary floating point math works like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

0.1000000000000000055511151231257827021181583404541015625 in decimal, or
0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

0.1 in decimal, or
0x1.99999999999999...p-4 in an analog of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation since 5/7 can't be represented exactly with any decimal number).

So, no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Note: Working with Floats in Programming

実際には、この精度の問題は、浮動小数点数を表示する前に、丸め関数を使用して、必要な小数点以下の桁数に浮動小数点数を丸める必要があることを意味します。

また、等価性テストを、ある程度の許容範囲を許可する比較に置き換える必要があります。これは次のことを意味します。

するなif (x == y) { ... }

代わりにを行ってくださいif (abs(x - y) < myToleranceValue) { ... }。

absは絶対値です。はmyToleranceValue、特定のアプリケーションに合わせて選択する必要があります。また、許容できる「余裕」の量や、比較する最大数 (精度の問題により) に大きく関係します。選択した言語の「イプシロン」スタイルの定数には注意してください。これらは許容値として使用できますが、その有効性は、処理する数値の大きさ (サイズ) によって異なります。大きな数値の計算では、イプシロンしきい値を超える可能性があるためです。

Answer 1

Binary floating point math works like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

0.1000000000000000055511151231257827021181583404541015625 in decimal, or
0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

0.1 in decimal, or
0x1.99999999999999...p-4 in an analog of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation since 5/7 can't be represented exactly with any decimal number).

So, no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Note: Working with Floats in Programming

実際には、この精度の問題は、浮動小数点数を表示する前に、丸め関数を使用して、必要な小数点以下の桁数に浮動小数点数を丸める必要があることを意味します。

また、等価性テストを、ある程度の許容範囲を許可する比較に置き換える必要があります。これは次のことを意味します。

するなif (x == y) { ... }

代わりにを行ってくださいif (abs(x - y) < myToleranceValue) { ... }。

absは絶対値です。はmyToleranceValue、特定のアプリケーションに合わせて選択する必要があります。また、許容できる「余裕」の量や、比較する最大数 (精度の問題により) に大きく関係します。選択した言語の「イプシロン」スタイルの定数には注意してください。これらは許容値として使用できますが、その有効性は、処理する数値の大きさ (サイズ) によって異なります。大きな数値の計算では、イプシロンしきい値を超える可能性があるためです。

Is floating-point math broken? Ask Question

ベストアンサー1

Side Note: All positional (base-N) number systems share this problem with precision

Side Note: Working with Floats in Programming

おすすめ記事