|
3 | 3 | <feed xmlns="http://www.w3.org/2005/Atom"> |
4 | 4 | <id>https://blog.simplecode.gr</id> |
5 | 5 | <title>Simplecode Blog</title> |
6 | | - <updated>2025-10-16T06:09:42.857Z</updated> |
| 6 | + <updated>2025-10-16T06:17:40.504Z</updated> |
7 | 7 | <generator>Astro-Theme-Retypeset with Feed for Node.js</generator> |
8 | 8 | <author> |
9 | 9 | <name>Simplecode</name> |
|
18 | 18 | <link href="https://blog.simplecode.gr/posts/the-magic-world-of-numbers-in-computers/"/> |
19 | 19 | <updated>2025-10-15T00:00:00.000Z</updated> |
20 | 20 | <summary type="html"><![CDATA[The Magic World of Float Numbers]]></summary> |
21 | | - <content type="html"><![CDATA[<h1>Intro</h1> |
| 21 | + <content type="html"><![CDATA[<h2>Intro</h2> |
22 | 22 | <p>Floats are exciting. We often assume that numerical results from computers are extremely accurate, but that’s frequently not the case. This happens because there are accurate and non-accurate ways to represent numbers. Accurate representations consume more storage, memory, and processing power compared to non-accurate ones. As a result, non-accurate forms - like floats - are often used in places you might not expect.</p> |
23 | 23 | <p>In fact, floats are everywhere! It’s actually more common to encounter a float than an accurate number. Many programming languages even default to treating numbers as floats, and special tricks are needed to handle numbers in an accurate form. The proccessing power of supercomputers is usually meassured in FLOPS which is -guess-... "Floating Point Operations Per Second".</p> |
24 | 24 | <p>Now that we know this, let’s dive deep into the non-accurate world of floating-point numbers.</p> |
25 | 25 | <p>The core idea behind floats is borrowed from scientific notation. For example, the approximate size of an atom in kilometers, ~0.0000000000001, can be represented as 1*10^-13. The first form uses 14 digits, while the second uses only 5.</p> |
26 | 26 | <p>The latter is much simpler, but floats add a layer of complexity. First, there isn't just one standard for storing floats - there are many. Thankfully, the most widely accepted standard is IEEE 754, which is the one you'll encounter almost everywhere and the one we'll focus on in this article. After this, IEEE 754 is a bit less straight-forward than scientific notation.</p> |
27 | 27 | <hr /> |
28 | | -<h1>IEEE 754</h1> |
| 28 | +<h2>IEEE 754</h2> |
29 | 29 | <p>As we said before, there isn't only one standard for storing floats, but many. IEEE 754 is by far the most common. It breaks every floating-point number into three distinct parts, packed into either 32 bits (single-precision) or 64 bits (double-precision):</p> |
30 | 30 | <ul> |
31 | 31 | <li><strong>Sign Bit (1 bit)</strong>: A single bit that decides whether the number is positive (0) or negative (1).</li> |
32 | 32 | <li><strong>Biased Exponent</strong> (8 bits for 32-bit, 11 bits for 64-bit): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. For 32-bit floats, the bias is 127. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so in the scientific notation example we saw before would had been 1*2^-43 (2^-43 = 10^-13).</li> |
33 | 33 | <li><strong>Mantissa</strong> (23 bits for 32-bit, 52 bits for 64-bit): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number 1.0110 is stored as just 0110, saving space but adding complexity.</li> |
34 | 34 | </ul> |
35 | | -<h2>IEEE 754 Example</h2> |
| 35 | +<h3>IEEE 754 Example</h3> |
36 | 36 | <p>A nice example would beeing calculating 3.14... in IEEE 754.</p> |
37 | | -<h3>I. Binary</h3> |
| 37 | +<h4>I. Binary</h4> |
38 | 38 | <ul> |
39 | 39 | <li> |
40 | 40 | <p>3 in binary is <code>11</code></p> |
|
46 | 46 | <p>3.14 in binary is <code>11.0010101111010111000...</code></p> |
47 | 47 | </li> |
48 | 48 | </ul> |
49 | | -<h3>II. Biased Exponent</h3> |
50 | | -<h4>II.I. Exponent</h4> |
| 49 | +<h4>II. Biased Exponent</h4> |
| 50 | +<h5>II.I. Exponent</h5> |
51 | 51 | <p>First, we need to normalize <code>11.0010101111010111000...</code> to the form <code>1.xxx * 2^exponent</code>. Much like converting to scientific notation.</p> |
52 | 52 | <p>As with scientific notation, we have to use a little bit our fantasy. The only constraint is that the base of the exponent this time needs to be 2, so we need something like 2^-43 and not 10^-13.</p> |
53 | 53 | <p>All we need is to just convert to the <code>1.xxx * 2^exponent</code> form.</p> |
54 | 54 | <p>In this case, we will do this simply by doing a binary shift, but this is just an example of our case.</p> |
55 | 55 | <p><code>11.0010101111010111000...</code> -> <code>1.10010101111010111000... * 2^1</code></p> |
56 | 56 | <p>By converting to <code>1.xxx * 2^exponent</code> form we see that our exponent is 1.</p> |
57 | | -<h4>II.II. Biased Exponent</h4> |
| 57 | +<h5>II.II. Biased Exponent</h5> |
58 | 58 | <p>Our exponent is 1. But don't forget, as we said before, that in IEEE 754 we dont store the exponent, but a bias of the exponent ([...]This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2.[...]).</p> |
59 | 59 | <p>We will assume that we are talking about a 32-bit float, so our bias, based on what is written above, is 127.</p> |
60 | 60 | <p>So the number we will store is : the exponent + 127 => 1 + 127 = 128.</p> |
61 | 61 | <p>But we want this in binary. 128 in binary is <code>10000000</code>. That's our "Biased Exponent".</p> |
62 | | -<h3>III. Split into IEEE 754 Parts</h3> |
| 62 | +<h4>III. Split into IEEE 754 Parts</h4> |
63 | 63 | <ul> |
64 | 64 | <li>Sign: <code>0</code> (positive)</li> |
65 | 65 | <li>Biased Exponent: <code>10000000</code></li> |
66 | 66 | <li>Mantissa: Take the first 23 bits after the 1.: <code>10010101111010111000101</code> (truncated to fit)</li> |
67 | 67 | </ul> |
68 | | -<h3>IV. Result</h3> |
| 68 | +<h4>IV. Result</h4> |
69 | 69 | <p>Indeed, <code>0 10000000 10010101111010111000101</code> is the IEEE 754 of the number 3.14... You can verify this by using a handy IEEE 754 calculator [1].</p> |
70 | 70 | <hr /> |
71 | | -<h1>The (10^100) + 1 − (10^100) Problem</h1> |
| 71 | +<h2>The (10^100) + 1 − (10^100) Problem</h2> |
72 | 72 | <p>Open a calculator on an iPad or an iPhone and by switching to scientific mode, try to solve <code>(10^100) + 1 − (10^100)</code>. What you will get as a result is <code>0</code> which is obviously wrong, as the correct answer is <code>1</code>.</p> |
73 | 73 | <p>This is a snowball effect caused by the way float-pointing arithmetic works. This tiny <code>+1</code> gets lost into the inaccuracy of float arithmetic when is added to such a big number as <code>10^100</code>.</p> |
74 | 74 | <p>Other calculators gets it correct, but this is thanks to relying to more mathematically advanced data structures of representing numbers. [2]</p> |
75 | 75 | <p><img src="_images/par-10-power-100-par-plus-1-par-minus-10-power-100-par.jpg" alt="_images/par-10-power-100-par-plus-1-par-minus-10-power-100-par.jpg" /></p> |
76 | 76 | <hr /> |
77 | | -<h1>The 0.1 + 0.2 Problem</h1> |
| 77 | +<h2>The 0.1 + 0.2 Problem</h2> |
78 | 78 | <p>On your favorite programming language, try to solve the simple <code>0.1+0.2</code>. What you will get is <code>0.30000000000000004</code>. This means that your language treats numbers as floats by default, as almost all languages do.</p> |
79 | 79 | <p>This means as well that <code>0.1+0.2 == 0.3</code> is equal to... <code>false</code>.</p> |
80 | 80 | <p>That's an interesting thing to know for the next time you will do mathematical operations. If float, take into account the inaccuracy that comes with, and do the proper tricks to ensure the required flexibility, or use a different way of representing your numbers if needed.</p> |
81 | 81 | <hr /> |
82 | | -<h1>Inifinity and "Not a Number" (NaN) (IEEE 754)</h1> |
| 82 | +<h2>Inifinity and "Not a Number" (NaN) (IEEE 754)</h2> |
83 | 83 | <p>Two very interesting mathematical concepts that are impossible to represent in any other... accurate form of numbers, are infinity and "not a number". A number can be infinite, and a number can be "not a number" temporarily, until it becomes a number, or simply because it's the result of a calculation like division with 0 - instead of error it's better to take "NaN" sometimes.</p> |
84 | 84 | <p>So IEEE 754 float numbers comes here to save the day if you need to represent those 2 very important mathematical concepts.</p> |
85 | | -<h2>Inifinity</h2> |
| 85 | +<h3>Inifinity</h3> |
86 | 86 | <p>If you raise all the numbers of the exponent and none of the mantissa, you get infinity ! (<code>0 11111111 00000000000000000000000</code>)</p> |
87 | | -<h2>"Not a Number" (NaN)</h2> |
| 87 | +<h3>"Not a Number" (NaN)</h3> |
88 | 88 | <p>"Not a Number" is a special case of number meant as a placeholder value for a numerical value that it's nit set yet or it was the result of an error, like for example from a division with 0 which is impossible.</p> |
89 | 89 | <p>(Talking about NaN, its actually very interesting that JSON, that it's a representation of values in a way that JS works, doesn't support NaN. JS, as many other languahes, treats numbers by default as floats, so it's interesting that JSON doesn't allow a number to be NaN. Instead, NaNs are usually converted to NULLs on most of the times when working with JSON.)</p> |
90 | | -<h3>"Quiet NaN"</h3> |
| 90 | +<h4>"Quiet NaN"</h4> |
91 | 91 | <p>If you raise all the numbers of the exponent and the "most significant bit" of the mantissa (the first or the last, depends on the endianess), you get a "Quiet NaN". This NaN is natural and it doesnt trigger an error. (<code>0 11111111 00000000000000000000001</code> / <code>0 11111111 10000000000000000000000</code>).</p> |
92 | | -<h3>"Signaling NaN"</h3> |
93 | | -<ul> |
94 | | -<li>If you raise all the numbers of the exponent and a bit that's not the "most significant bit" of the mantissa (like, just one in the middle, just to be sure), you get a "Signaling NaN". This NaN triggers an error. (<code>0 11111111 00000000000000000000010</code>).</li> |
95 | | -</ul> |
| 92 | +<h4>"Signaling NaN"</h4> |
| 93 | +<p>If you raise all the numbers of the exponent and a bit that's not the "most significant bit" of the mantissa (like, just one in the middle, just to be sure), you get a "Signaling NaN". This NaN triggers an error. (<code>0 11111111 00000000000000000000010</code>).</p> |
96 | 94 | <hr /> |
97 | | -<h1>Outro</h1> |
| 95 | +<h2>Outro</h2> |
98 | 96 | <p>Floats may seem inaccurate, but this inaccuracy allows to handle bigger problems, as with huge numbers or with less hardware.</p> |
99 | 97 | <p>Extreme accuracy is not always needed, let alone that it may not even exist in reallity.</p> |
100 | 98 | <p>Even if someone is interested only in extreme accuracy, floats are interesting to search about. By studying floats, one can reflect and understand the problem they try to solve and this is working with very big ranges, very unclear sets of numbers. Easily comes out of this the opposite, that a way to achive extreme accuracy, is actually to acknowledge that it's very subjective, so you eventually have to limit the sets you are working with to something fixed.</p> |
101 | 99 | <p>This post was meant to unlock the secret world of float numbers in order for us to use them better.</p> |
102 | 100 | <hr /> |
103 | | -<h1>Links</h1> |
| 101 | +<h2>Links</h2> |
104 | 102 | <ul> |
105 | 103 | <li>[1] "IEEE-754 Floating Point Converter" <a href="https://www.h-schmidt.net/FloatConverter/IEEE754.html">https://www.h-schmidt.net/FloatConverter/IEEE754.html</a></li> |
106 | 104 | <li>[2] "A calculator app? Anyone could make that." <a href="https://chadnauseam.com/coding/random/calculator-app">https://chadnauseam.com/coding/random/calculator-app</a></li> |
|
0 commit comments