Homework 1 (due March 6)

Finish this homework in the appropriate cells, and copy this file to your home directory on ursa, so I can grab them and inspect them. Please keep the original filename.

Q1.

Create a new file called "hello.py" with just one python statement that prints "hello" to the screen. Test that it works with the command

    python hello.py

Now we want to promote this script to become a true unix command, so we want to just type in a unix terminal

      hello.py

and get the same "hello" printed to the screen.

Try this out, but what errors can you think of (and maybe you got) when you worked this out, and explain your results.

Hint: I was able to get 4 types of errors at different stages of getting this to work.

A1.

Starting with a one liner text file:

   echo "print('hello')" > hello.py

I was able to get the following types or errors, in the following order of events while trying to fix them:

1) "hello.py: Command not found."

. (the current directory) is not in the path. Fix this with "export PATH=.:$PATH" in bash, or "setenv PATH .:$PATH" in csh. After fixing this, I got:

2) "hello.py: Permission denied."

forgot "chmod +x hello.py" to make the script executable. After this I got:

3) "Unescaped left brace in regex is deprecated, ...."

There is no proper #!, and it then assumes bash (or /bin/sh), so we need to teach it that this is a python script. After trying that, I got this:

4) "hello.py: Command not found."

you might not get this, but this is not a repeat of 1), it's when you give the path of the #! wrong, e.g. #!/bin/python, since it's #!/usr/bin/python. You can also make it #!/usr/bin/env python"

Q2.

Assign an integer value to n. Write a snippet of code that computes the sum of all even numbers up to and including n. Test this for a few small values of n, even and odd. Print out the sum for n=999 and n=1000. What happened if you set n to be a negative number.

You can use any method you like: Functions, For loops., Numpy arrays, Python lists, just to name a few.


In [1]:
# A2.
import numpy as np

def sum1(n):
    """straight sum, for-loop with if"""
    sum = 0
    for i in range(n+1):
        if i%2 == 0:             # only add even numbers
            sum += i
    return sum

def sum2(n):
    """only half the array, more efficient"""
    sum = 0
    n2 = n//2                # the choice of n2
    for i in range(n2+1):    # and n2+1 requires thought
        sum += i
    sum *= 2
    return sum

def sum3(n):
    """full range, but stepping in two"""
    sum = 0
    for i in range(0,n+1,2):
        sum += i
    return sum

def sum4(n):
    """using sum2 method and numpy.sum is fast"""
    n2 = n//2
    a = np.arange(n2+1)
    return 2*a.sum()

def sum5(n):
    """looping in numpy is also slow"""
    sum = 0
    n2 = n//2
    for i in np.arange(n2+1):
        sum += i
    sum *= 2
    return sum

# for cases n=3 and n=4 we know the answer (2 and 6 resp.)
for s in [sum1,sum2,sum3,sum4,sum5]:
    print("test for 3,4: %d %d %s" % (s(2),s(4),s.__doc__))

# the required numbers for the homework
print("%d %d" % (999,sum1(999)))
print("%d %d" % (1000,sum1(1000)))

# and what the heck, give it a big number and see how they all perform
n = 1000000
%time s1=sum1(n)
%time s2=sum2(n)
%time s3=sum3(n)
%time s4=sum4(n)
%time s5=sum5(n)

# they better all be the same still
print(s1,s2,s3,s4,s5)


test for 3,4: 2 6 straight sum, for-loop with if
test for 3,4: 2 6 only half the array, more efficient
test for 3,4: 2 6 full range, but stepping in two
test for 3,4: 2 6 using sum2 method and numpy.sum is fast
test for 3,4: 2 6 looping in numpy is also slow
999 249500
1000 250500
CPU times: user 180 ms, sys: 0 ns, total: 180 ms
Wall time: 177 ms
CPU times: user 44 ms, sys: 0 ns, total: 44 ms
Wall time: 42.2 ms
CPU times: user 40 ms, sys: 0 ns, total: 40 ms
Wall time: 38.7 ms
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 2.39 ms
CPU times: user 76 ms, sys: 0 ns, total: 76 ms
Wall time: 78.9 ms
250000500000 250000500000 250000500000 250000500000 250000500000

Discussion

For n=1000000 I obtained timings of around 160ms, 40ms, 40ms, 1ms and 80ms resp. (in CPU, not Wall clock time). If you repeat these for smaller values of n you might see a 0 "CPU times" (too fast to measure), but because interfaces to ipython are slow, you will see varying degrees of time it takes to finish (Wall time)

Clearly shortening the loop and thus removing the if-statement in sum1 helped a lot, but we are looping in python, which is slow.

Converting to numpy however means that memory has to be sacrificed in an array (a) over which then can be summed really fast.

Q3.

Why does python not have the switch/case control flow statement like one has in C and Java?

If you use an online resource, give a reference.

A3.

First an example in python how it would look if it followed C:

   switch n:
      case 1:
         print("n=1")
         # break
      case 2:
         print("n=2")
         # break
      case 3:
         print("n=3")
         # break
      default:
         print("something else")

however in C you would need an additional break before the next case if you did not want to follow through the next case. This gives for interesting flow patterns!

Apart from this odd ambiguity, if a break would be needed, you can also solve this with a series of if/elif/elif/.../else:

    if n==1:
        print("n=1")
    elif n==2:
        print("n=2")
    elif n==3:
        print("n=3")
    else:
        print("something else")

If you google, you will find some discussions and functional replacements using a python dictionary. But these are limited to the cases where multiple statements in each case can be replaced with a single function call.

The python language is evolved using PEP (Python Enhanced Proposal). Check out this one: https://www.python.org/dev/peps/pep-3103/ where it was rejected to be part of the language, way back in 2006! Guido van Rossum has a nickname of the Benevolent Dictator For Life (BDFL).

Q4.

For the following experiment we want to plot the performance difference between adding up the elements in a list vs. numpy array, as function of the size of the list/array. Much like in the 03-arrays time your list and numpy cell for a number of values of n (you may need to go up to n=1,000,000 or even higher).

Put your numbers in a hardcoded list in A3c. You will need to rerun the A3a and A3b cells a number of times.

Plot your results in a log-log plot. What is a nicer plot, N vs. CPU or N vs. CPU/N?

A4.

Below we will first set up a cell with some definitions, and try out a few loops to get an idea how to accumulate the numbers for the graph most efficiently without bringing the machine to a grinding halt.


In [2]:
#  this cell sets up variables and functions, so we don't need to
#  time these. initializing arrays will take time too, but we're not 
#  interested in these either.
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

n = 5000000
mode = 1

if mode == 0:
    # simple sequence
    a1 = np.arange(n)
    a2 = np.arange(n * 1.0) 
    a3 = list(range(n))
    a4 = a2.tolist()
else:
    # random numbers between 0 and 100 (more expensive)
    np.random.seed(123)
    a2 = np.random.uniform(0.0,100.0,n)
    a1 = a2.astype(np.int64)
    a3 = a1.tolist()
    a4 = a2.tolist()
    
def geta(n, seed=None):
    """return random numpy array [0,100], optional seed"""
    if seed != None:
        np.random.seed(seed)
    a = np.random.uniform(0.0,100.0,n)
    return a

def sum0(a):
    """the native sum that python lists support"""
    return sum(a)

def sum1(a):
    """loop over array elements and sum them"""
    sum = 0
    for x in a:
        sum += x
    return sum

def sum2(a):
    """loop over array using indices"""
    sum = 0
    for i in range(len(a)):
        sum += a[i]
    return sum

def sum3(a):
    """native and fast numpy sum"""
    return a.sum()

if n == 10:
    # debugregression test
    print("n=%d mode=%d" % (n,mode))
    print(type(a1),type(a2),type(a3),type(a4))
    print(type(a1[0]),type(a2[0]),type(a3[0]),type(a4[0]))
    if mode==0: print("Should all print 45 or 45.0")
    for a in [a1,a2,a3,a4]:
        print(sum1(a))
        print(sum2(a))
    for a in [a1,a2]:
        print(sum3(a))
else:
    print("Done for n=",n)


Done for n= 5000000

In [3]:
# comparing various methods (in the end we'll just pick sum1 and sum3)
print(n)
print("  numpy int")
%time s01 = sum0(a1)
%time s11 = sum1(a1)
%time s21 = sum2(a1)
%time s31 = sum3(a1)
print("  numpy float")
%time s02 = sum0(a2)
%time s12 = sum1(a2)
%time s22 = sum2(a2)
%time s32 = sum3(a2)
print("  list int")
%time s03 = sum0(a3)
%time s13 = sum1(a3)
%time s23 = sum2(a3)
print("  list float")
%time s04 = sum0(a4)
%time s14 = sum1(a4)
%time s24 = sum2(a4)


5000000
  numpy int
CPU times: user 556 ms, sys: 4 ms, total: 560 ms
Wall time: 555 ms
CPU times: user 764 ms, sys: 8 ms, total: 772 ms
Wall time: 769 ms
CPU times: user 964 ms, sys: 0 ns, total: 964 ms
Wall time: 965 ms
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 9.25 ms
  numpy float
CPU times: user 564 ms, sys: 0 ns, total: 564 ms
Wall time: 565 ms
CPU times: user 764 ms, sys: 0 ns, total: 764 ms
Wall time: 767 ms
CPU times: user 1.02 s, sys: 0 ns, total: 1.02 s
Wall time: 1.03 s
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 6.3 ms
  list int
CPU times: user 60 ms, sys: 0 ns, total: 60 ms
Wall time: 60.1 ms
CPU times: user 284 ms, sys: 0 ns, total: 284 ms
Wall time: 284 ms
CPU times: user 572 ms, sys: 0 ns, total: 572 ms
Wall time: 570 ms
  list float
CPU times: user 40 ms, sys: 0 ns, total: 40 ms
Wall time: 38 ms
CPU times: user 204 ms, sys: 0 ns, total: 204 ms
Wall time: 208 ms
CPU times: user 444 ms, sys: 0 ns, total: 444 ms
Wall time: 441 ms

This first analysis shows us that for numpy arrays there isn't much difference between sum0 and sum1 (~20-30%), but the native sum3 is much much faster. But for python lists sum0 is a lot faster than sum1 (factor 5-6). Since most numerical work is numpy floating point, we will do the comparison below between sum1 and sum3 for the a2 array.


In [4]:
# A3a.  time the python list
n1 = [10000, 100000, 1000000, 10000000, 20000000, 40000000]
for n in n1:
    print(n)
    %time a = geta(n)      # ignore these timings 
    %time s = sum1(a)
    del(a)
c1  = [4e-3, 28e-3, 148e-3, 1.51,   2.86, 5.9]
c1i = [1e-3,  4e-3,  12e-3, 744e-3, 2.45, 1.12]


10000
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 274 µs
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 1.52 ms
100000
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 2.44 ms
CPU times: user 20 ms, sys: 4 ms, total: 24 ms
Wall time: 20.3 ms
1000000
CPU times: user 16 ms, sys: 0 ns, total: 16 ms
Wall time: 16.6 ms
CPU times: user 160 ms, sys: 0 ns, total: 160 ms
Wall time: 158 ms
10000000
CPU times: user 160 ms, sys: 4 ms, total: 164 ms
Wall time: 162 ms
CPU times: user 1.5 s, sys: 0 ns, total: 1.5 s
Wall time: 1.5 s
20000000
CPU times: user 272 ms, sys: 16 ms, total: 288 ms
Wall time: 286 ms
CPU times: user 2.98 s, sys: 0 ns, total: 2.98 s
Wall time: 2.98 s
40000000
CPU times: user 604 ms, sys: 440 ms, total: 1.04 s
Wall time: 1.04 s
CPU times: user 6.12 s, sys: 0 ns, total: 6.12 s
Wall time: 6.12 s

In [5]:
# A3b.  time the numpy array
n2 = [  10000000, 40000000, 100000000, 200000000, 300000000, 500000000]
for n in n2:
    print(n)
    %time a = geta(n)     # ignore these timings
    %time s=sum3(a)
    del(a)
c2  = [4e-3, 20e-3, 52e-3, 116e-3, 172e-3, 268e-3]
c2i = [0.2,    0.9,   3.0,   4.46,   7.11,  14.5]
print(len(n2),len(c2))


10000000
CPU times: user 156 ms, sys: 4 ms, total: 160 ms
Wall time: 157 ms
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 11.7 ms
40000000
CPU times: user 640 ms, sys: 32 ms, total: 672 ms
Wall time: 671 ms
CPU times: user 44 ms, sys: 0 ns, total: 44 ms
Wall time: 43.3 ms
100000000
CPU times: user 1.49 s, sys: 6.98 s, total: 8.47 s
Wall time: 8.47 s
CPU times: user 128 ms, sys: 0 ns, total: 128 ms
Wall time: 127 ms
200000000
CPU times: user 2.82 s, sys: 9.42 s, total: 12.2 s
Wall time: 12.2 s
CPU times: user 264 ms, sys: 0 ns, total: 264 ms
Wall time: 262 ms
300000000
CPU times: user 4.24 s, sys: 2.13 s, total: 6.37 s
Wall time: 6.37 s
CPU times: user 380 ms, sys: 0 ns, total: 380 ms
Wall time: 376 ms
500000000
CPU times: user 7.39 s, sys: 4.41 s, total: 11.8 s
Wall time: 12.5 s
CPU times: user 600 ms, sys: 0 ns, total: 600 ms
Wall time: 595 ms
6 6

In [6]:
# A3c.  plot the results]

# repeating the arrays we obtained in A3a and A3b:

n1 = [10000, 100000, 1000000, 10000000, 20000000, 40000000]
c1 = [4e-3, 28e-3, 148e-3, 1.51, 2.86, 5.9]
c1i = [1e-3, 4e-3, 12e-3, 744e-3, 2.45, 1.12]

n2 = [  10000000, 40000000, 100000000, 200000000, 300000000, 500000000]
c2 = [4e-3, 20e-3, 52e-3, 116e-3, 172e-3, 268e-3]
c2i = [0.2, 0.9,    3.0,   4.46,   7.11,  14.5]

x1=np.array(n1)
y1=np.array(c1)
y1i=np.array(c1i)
z1=y1/x1
z1i=y1i/x1
x2=np.array(n2)
y2=np.array(c2)
y2i=np.array(c2i)
z2=y2/x2
z2i=y2i/x2

In [7]:
plt.figure()
plt.plot(x1,y1,'+-',label='sum +=')
plt.plot(x2,y2,'+-',label='np.sum')
plt.scatter([x1,x2],[y1i,y2i],label='geta()')
plt.xscale('log')
plt.yscale('log')
plt.xlabel('N')
plt.ylabel('CPU[sec]')
plt.legend(loc='upper left')
plt.show()



In [8]:
plt.figure()
plt.plot(x1,z1,'+-',label='sum +=')
plt.plot(x2,z2,'+-',label='np.sum')
plt.scatter([x1,x2],[z1i,z2i],label='geta()')
plt.xscale('log')
plt.yscale('log')
plt.xlabel('N')
plt.ylabel('CPU[sec]/N')
plt.legend(loc='best')
plt.show()


As can be seen in this plot, the code eventually behaves linear in N, probably better seen in the 2nd plot. It is interesting to see that the code to initialize the array (geta()) is inbetween the slow sum() and the fast numpy.sum(), but also has errors (possibly human) in their measurements.

The first plot shows the whole dynamic range of the measurements, and shows better where we can trust the measurements, and how much patience we need to make the two curves have overlapping values of N.


In [9]:
grades = [36,36,35,26,25,28,24,23,31,21]
gh=plt.hist(grades,cumulative=False)



In [10]:
gh=plt.hist(grades,bins=10,range=(0,40),cumulative=False)



In [ ]: