Objective: - Tom Kleen



#4: More Floating PointObjective:IEEE 754 Floating point standardFloating Point Calculations: Base 10Multiplication and DivisionMultiplication and division are easiest, so we'll do them first. Consider how multiplication of numbers in scientific notation is done. Problem: Multiply 250 times 50. The result should be 12,500. Convert 250 to scientific notation: 250 = 2.5 x 102Convert 50 to scientific notation: 5 = 5.0 x 101. Our problem is now: 2.5 x 102 x 5.0 x 101.Group the mantissas together and multiply: 2.5 * 5.0 = 12.5.Group the powers of 10 together and multiply: 102 * 101 = 103So the result is 12.5 x 103, which is 12,500. This is NOT normalized, so we must normalize it to: 1.25 x 104. All you have to do is (1) multiply mantissas, (2) add exponents, and (3) normalize. When doing division, divide mantissas and subtract exponents. If the resulting mantissa isn't normalized, normalize it.Addition and SubtractionAddition and subtraction are slightly more difficult because only numbers with the same place values can be added or subtracted. We will look at an example in scientific notation.ExampleLet's add 250 + 5. The result should be 255. Convert 250 to scientific notation: 2.5 x 102Convert 5 to scientific notation: 5.0 x 100We must take the number with the smaller exponent and change it so that its exponent is the same as the larger exponent. This means that we must take 5.0 x 100 and make its exponent 2, which is the same as multiplying by 100. If we multiply the number by 100, we must divide its mantissa by 100 also to keep from changing the value of the number. So the number 5 is rewritten as .050 x 102. Now we can add the mantissas: 2.5 + .05 = 2.55. The exponent remains 2, so the result is 2.55 x 102, which is 255. The result is already in normalized form, so we are done.ExampleLet's add the numbers 975 and 50. The result should be 1025. InConvert 975 to scientific notation: 9.75 x 102.Convert 50 to scientific notation: 5.0 x 101Take the number with the smaller exponent, and change it so that its exponent is the same as the larger exponent. So the 50 can be rewritten as 0.5 x 102. Now that the exponents are the same, we can add the mantissas: 9.75 + 0.5 = 10.25. The exponent is 2, so the result is 10.25 x 102. However, this result is not normalized. To normalize it, we must divide the mantissa by 10, giving 1.025. And if we divide one part of the number by 10, we must multiply another part by 10 or we will change the value of the number. The "other part" that we multiply by 10 is the exponent: 102 x 10 = 103. So our normalized answer is 1.025 x 103Addition and Subtraction, Base 2 Example 1Take the same numbers that we did in base 10: 250 + 5.Convert 250 to binary: 1111 1010Convert to floating point: 1.1111 0100 * 27 Convert 5 to binary: 0000 0101Convert to floating point: 1.01 * 22 Convert to same exponent as 250: .0000 101 * 27Now we can add the mantissas: 1.1111 0100+ 0.0000 1010 1.1111 1110 * 27 which is 1111 1111, which is 255 in base 10.Example 2Take the same numbers that we did in base 10: 975 and 50.Convert 975 to binary: 11 1100 1111Convert to floating point: 1.1110 0111 1 * 29 Convert 50 to binary: 110010Convert to floating point: 1.10010 * 25 Convert to same exponent as 975: .0001 1001 * 29 Now we can add the mantissas: 1.1110 0111 1 + 0.0001 1001 0 10.0000 0000 1 * 29 which is 100 0000 0001, which is 1025 in base 10.Normalize and it becomes: 1.0000 0000 01 * 210 Note that if you add a very large number and a very small number that the small number basically disappears!Problems with floating point numbers: PythonA problem with floating point is that many numbers cannot be accurately represented. Convert the decimal value 0.1 (1/10) to binary using the algorithm from last class (multiply by 2 and take the digits that are on the left side of the decimal point every time):0.20.40.81.6 (throw away the 1)1.20.40.81.61.20.40.81.61.2…Our number is 0.0 0011 0011 0011…The 0011 repeats forever. So 1/10th cannot be accurately represented in base 2!NOTE: To verify that this number actually is 0.1, multiply it by 32:16x = 1.10011 0011 0011… x = 0.00011 0011 0011…Subtract the two numbers. The difference is:15x = 1.1000000000015x is 1.5. Therefore, x must be 0.1.How does this affect programs?Floating Point precision example #1Run the following Python program. The sum of 1 + (1/2+1/2) + (1/3+1/3+1/3) + … (1/100+1/100…) should be 100.You should get 1 every time but you don’t!for i in range(1,101):#1-100 sum = 0; for j in range(1, i+1): # Start adding 1/2 twice, then 1/3 3 times, 1/4 4 times, etc. sum += 1.0 / i# They should all equal 1, but they don't! print(f"{i:3}: {sum:0.20f}")Floating Point precision example #2Add the following code, which adds 0.1 1000 times (should be 100):x = 0 for i in range(0,1000): x += 0.1print(f"x should be 100: {x:0.20f}")Floating Point precision example #3Add this code, which is supposed to stop when x == 1. x = 0while (x != 1): x += 0.1 print(x) print(f"x should be 1: {x:0.20f}")More floating-point funEnter the following at the Python interactive prompt:>>> .1 + .1 + .1 == .3FalseAlso, since the 0.1 cannot get any closer to the exact value of 1/10 and 0.3 cannot get any closer to the exact value of 3/10, then pre-rounding with?round()?function cannot help:>>> round(.1, 1) + round(.1, 1) + round(.1, 1) == round(.3, 1)FalseThough the numbers cannot be made closer to their intended exact values, the?round()?function can be useful for post-rounding so that results with inexact values become comparable to one another:>>> round(.1 + .1 + .1, 10) == round(.3, 10)TrueFloating point constantsThe following C# example demonstrates some floating point constants: float x = 0; Console.WriteLine("0: " + x + ", " + SingleToHex(x)); x = Single.NaN; Console.WriteLine("NaN: " + x + ", " + SingleToHex(x)); x = Single.NegativeInfinity; Console.WriteLine("NegativeInfinity: " + x + ", " + SingleToHex(x)); x = Single.PositiveInfinity; Console.WriteLine("PositiveInfinity: " + x + ", " + SingleToHex(x)); x = Single.PositiveInfinity + Single.NegativeInfinity; Console.WriteLine("Pos Inf + Neg Inf: " + x + ", " + SingleToHex(x)); float y = 0; x = 0 / y; Console.WriteLine("Div by 0: " + x + ", " + SingleToHex(x));Note that the following function is required: static String SingleToHex(float d) { //Console.WriteLine("Double value: " + d.ToString()); byte[] bytes = BitConverter.GetBytes(d); //Console.WriteLine("Byte array value:"); return BitConverter.ToString(bytes); }The Decimal data typeC# provides another way of representing fractions: the Decimal data type. It is similar to the IEEE Decimal data type, but less complicated.The Decimal data type occupies 128 bits (16 bytes). 96 bits hold an integer. The remaining four bytes hold the sign and a negative exponent. The exponent tells where to position the decimal point in the integer.So the integer (unsigned) can be from 0 to 296-1. Since every 210 is approximately 103, we have 1027 * 26, or around 1029. Five bits are reserved for the negative exponent of the number. As far as I know, Decimal operations are implemented in software, not hardware, so there will be a performance penalty when using Decimal data. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download