Float128 typePlease complete the following assignment in Jupyter, and email the completed .ipynb file to Sheehan.Olver@sydney.edu.au by midnight on 14 April 2016.
You are strongly encouraged to work out the problems on your own. If you use existing solutions, existing code, help online or collaborate with other students, you will still receive full credit provided that you make this clear and cite your sources. If you collaborate with other students, please submit individual solutions and be specific in giving credit for what each student contributed.
There are two ways to complete the assignment: using bits and string manipulation, or using <<, >> on UInt128s.  Either approach will be accepted, though the latter is faster and hence recommended.
In this assignment you will construct a Float128 bits-type, represented by 128 bits:
In [4]:
    
import Base: show, bits, exponent, significand, rand, +, -, *, /, <, >, sign, BigFloat, inv, Float64
bitstype 128 Float128 <: AbstractFloat
    
For this types, we will interpret the 128-bits in data as: 1 sign bit, 63 exponent bits and 64 signficiand bits.  That is, numbers will be represented by
where $S = 2^62-1$, $q$ is an unsigned 63-bit integer in the 2nd through 64th bits and $b_1b_2\cdots b_{64}$ are in the last 64 bits.
We will not implement Inf, NaN or subnormal numbers.  We will however implement both $±0$.
We will use a globally defined constant S to represent $S$:
In [5]:
    
const S=Int64(2)^Int64(62)-Int64(1)
    
    Out[5]:
We will use the bits to get access to the bits.  The following function defines bits for a Float8:
In [7]:
    
bits(x::Float128)=bits(reinterpret(UInt128,x))
    
    Out[7]:
Exercise 1
(a) Complete the following exponent function, that returns $q-S$ (as an Int64).
(b) Check that
y=parse(UInt128,"01000000000000000000000000000000000000000000000000000000000000010010011001100110011001100110011001100110011001100110100000000000",2)
x=reinterpret(Float128,y)
exponent(x)
returns 2.
In [60]:
    
function exponent(x::Float128)
    y=reinterpret(UInt128,x)
    
    ## TODO: interpret the bits and return an integer
end
    
    Out[60]:
Exercise 2
Explain the following significand_uint64 function, that returns the significand bits of a Float128 as a UInt64.
In [11]:
    
function significand_uint64(x::Float128)
    y=reinterpret(UInt128,x)
    y % UInt64 
end
    
    Out[11]:
Exercise 3
Complete the following function sign to return the sign of a Float128:
In [14]:
    
function sign(x::Float128)
    #TODO: return the sign of x as +1 or -1
end
    
    Out[14]:
Exercise 4
An important skill to learn is how to read code that uses functionality that you are not yet familiar with.
(a) What is a BigFloat?  (Hint: use ?BigFloat)
(b) Add comments to the function BigFloat(x::Float128) and show(io::IO,x::Float128)  explaining how it works.  Why do I display precisely 20 digits in the definition of show?
(c) Check that
y=parse(UInt128,"00111111111111111111111111111111111111111111111111111111111111110100110011001100110011001100110011001100110011001101000000000000",2)
x=reinterpret(Float128,y)
returns 1.300000000000000044
In [18]:
    
function BigFloat(x::Float128)
    if reinterpret(UInt128,x)==0
        0.0
    elseif reinterpret(UInt128,x)==(UInt128(1) << 127)
        -0.0
    else
        ret=BigFloat(1)
        hf=BigFloat(0.5)
        sig = bits(significand_uint64(x))
        for k=1:length(sig)
            if sig[k]=='1'
                ret+=hf
            end
            hf/=2
        end
       sign(x)*2^BigFloat(exponent(x))*ret
    end
end
function show(io::IO,x::Float128)
    str=string(BigFloat(x))
    print(io,str[1:20])
    k=search(str,'e')           
    if k > 0
        print(io,str[k:end])
    end
end
    
    Out[18]:
Exercise 5
(a) Complete the function Float128(x::Float64) that converts a Float64 to a Float128
(b) Check that
Float128(2.5)
returns 2.500000000000000000 and
Float128(1.3)
returns 1.300000000000000044
(c) Explain why 2.5 was converted exactly but 1.3 was not.
In [26]:
    
function Float128(x::Float64)
    if x === 0.0
        reinterpret(Float128,UInt128(0))
    elseif x === -0.0
         reinterpret(Float128,UInt128(1) << 127)
    else
        #TODO: return a Float128 that equals the input
    end
end
    
    Out[26]:
In [25]:
    
Float128(2.5),Float128(1.3)
    
    Out[25]:
Exercise 6
(a) Complete the following definition Float64(x::Float128) that converts a Float128 to a Float64 by chopping
(b) Check that
Float64(Float128(1.3))
returns 1.3
In [28]:
    
function Float64(x::Float128)
    y=reinterpret(UInt128,x)
    
    if y == 0
        0.0
    elseif y == (UInt128(1) << 127)
        -0.0
    else
        #TODO: return a Float64 that is the input x with the extra bits chopped off
    end
end
    
    Out[28]:
Exercise 7
In the remaining exercises, we can assume all Float128s are positive normal numbers.
(a) Complete the following addition function for two Float128s, by reducing the problem to addition with UInt128s.  Round the computation towards zero.
(b) Check that
Float128(1.25)+Float128(2.5)
returns 3.750000000000000000
In [30]:
    
function +(x::Float128,y::Float128)
    qx=exponent(x)
    qy=exponent(y)
    
    if qx ≥ qy
        qx=exponent(x)
        qy=exponent(y)
        newexp = UInt64(qx+S)
        two64 = UInt128(1) << 64
        sigx = UInt128(significand_uint64(x)) + two64
        sigy = UInt128(significand_uint64(y)) + two64 
        # TODO: add sigx and sigy and return the resulting Float128.  
        # Remember: sigy needs to be shifted to account for different exponents.  
    else
        y+x
    end
end
    
    Out[30]:
Exercise 8
(a) Complete the following multiplication of two Float128s.  Again, assume that x and y are positive normal numbers.  Hint:  (1+sigx)*(1+sigy) = 1 + sigx + sigy + sigx*sigy
(b) Check that Float128(1.25)*Float128(2.5)  returns 3.125000000000000000.
In [61]:
    
function *(x::Float128,y::Float128)
    # TODO: Implement multiplication by reducing to multiplication with UInt128
end
    
    Out[61]:
Exercise 9
The following function computes division for Float128.
(a) Add comments explaining how the following functions work.
(b) Confirm that
Float128(1.3)/Float128(0.4)
returns 3.249999999999999930.
In [38]:
    
function inv(x::Float128)
    qx=exponent(x)
    sig = UInt128(significand_uint64(x))
    if sig == 0
        reinterpret(Float128,UInt128(S-qx)<<64)
    else
        newexp=UInt128(S-qx-1)<<64
        two64=UInt128(1) << 64
        sig = sig + two64
        two127=UInt128(1) << 127
        reinterpret(Float128,newexp+UInt128((div(two127,sig) << 2) % UInt64))
    end
end
function /(x::Float128,y::Float128)
    x*inv(y)
end
    
    Out[38]:
Exercise 10
(a) Write a routine that calculates
$$\sum_{k=0}^{28} {1 \over k!}$$using Float128 arithmetic, by using the / and + functions defined above.
(b) Compare the number of accurate digits to the higher accuracy approximation of e:
println(BigFloat(e))
In [ ]: