Float8
typePlease complete the following assignment in Jupyter, and email the completed .ipynb
file to Sheehan.Olver@sydney.edu.au by midnight on 14 April 2016.
You are strongly encouraged to work out the problems on your own. If you use existing solutions, existing code, help online or collaborate with other students, you will still receive full credit provided that you make this clear and cite your sources. If you collaborate with other students, please submit individual solutions and be specific in giving credit for what each student contributed.
In this assignment you will construct a Float8
type, containing one field data
, an 8-bit UInt8
:
In [6]:
import Base: show, bits, exponent, significand, +, -, *, /, Float64
type Float8
data::UInt8
end
For this types, we will interpret the 8-bits in data
as: 1 sign bit, 3 exponent bits and 4 signficiand bits. That is, numbers will be represented by
where $S = 3$, $q$ is an unsigned 3-bit integer in the 2nd through 4th bits and $b_1b_2b_3b_4$ are in the last 4 bits.
We will not implement Inf
, NaN
or subnormal numbers. We will however implement both $±0$.
We will use a globally defined constant S
to represent $S$:
In [8]:
const S=UInt8(3) # The shift
Out[8]:
We will use the bits
to get access to the bits. The following function defines bits
for a Float8
:
In [11]:
function bits(x::Float8)
bits(x.data)
end
Out[11]:
Exercise 1
(a) Complete the following exponent
function, that returns $q-S$ (as an integer).
(b) Check that
exponent(Float8(UInt8(123)))
returns 4.
In [18]:
function exponent(x::Float8)
## TODO: interpret the bits and return an integer
end
Out[18]:
Exercise 2
(a) Complete the following significand
function, that returns the significand of a Float8
. Don't forget to incorporate the sign
bit.
(b) Add comments explaining the purpose of the first 4 lines.
(c) Check that
significand(Float8(UInt8(123)))
returns 1.6875
.
In [28]:
function significand(x::Float8)
if x.data==0
0.0
elseif x.data==128
-0.0
else
bts=bits(x)
#TODO: convert to 8-bits in bts to a Floating point
end
end
Out[28]:
Excercise 3
(a) Use exponent
and significand
to complete the definition of Float64(x::Float8)
, which converts a Float8
to a Float64
(b) Check that
Float8(UInt8(123))
now displays as 27.0f8
.
In [35]:
function Float64(x::Float8)
#TODO: return a Float8 that equals the input
end
function show(io::IO,x::Float8)
print(io,Float64(x))
print(io,"f8")
end
Out[35]:
Exercise 4
(a) Complete the following chop_to_8_bits
function that returns a string for normal numbers containing the 8-bits for the Float8
representation. For this question, you can simply chop the significand bits of a Float64
. (Recall that a Float64
has 1 sign bit, 11 exponent bits and 52 significand bits.)
(b) Add comments explaining the definition of Float8(::Float64)
.
(c) Check that
Float8(1.25)
returns 1.25f8
.
(d) Explain why
Float8(1.3)
returns the same number.
In [47]:
function chop_to_8_bits(x::Float64)
#TODO: returns a string containing 8-bits for the Float8 approximation to x
end
function Float8(x::Float64)
if x===0.0
Float8(UInt8(0))
elseif x===-0.0
Float8(UInt8(128))
else
Float8(parse(UInt8,chop_to_8_bits(x),2))
end
end
Out[47]:
Exercise 5
Complete the following function that negates a Float8
:
In [ ]:
function -(x::Float8)
#TODO: return the bits corresponding to -x
end
Exercise 6
(a) Complete the following algebra operations, ensuring that each one returns a Float8
. You can use Float64(x::Float8)
and Float8(x::Float64)
to use the inbuilt Float64
arithmetic.
(b) Check that
Float8(1.25)+Float8(2.25)
returns 3.5f8
In [51]:
function +(x::Float8,y::Float8)
#TODO: return x+y as Float8
end
function *(x::Float8,y::Float8)
#TODO: return x*y as Float8
end
function /(x::Float8,y::Float8)
#TODO: return x/y as Float8
end
function -(x::Float8,y::Float8)
#TODO: return x-y as Float8
end
Out[51]:
Exercise 7
(a) Implement the following routine round_to_8bits
that rounds to the nearest Float8
,
rather than chops.
(b) Check that
Float8(parse(UInt8,round_to_8_bits(1.3),2))
returns 1.3125f8
.
In [61]:
function round_to_8_bits(x::Float64)
# TODO: Round a Float64 to a Float8
end
Out[61]: