CS1001.py

Extended Introduction to Computer Science with Python, Tel-Aviv University, Spring 2013

Recitation 9 - 9-13.5.2013

Last update: 13.5.2013

Data structures

Python's:

  • list
  • dict
  • string
  • tuple
  • set

Course implementations:

  • tree
  • matrix
  • hashtable
  • linked list

Linked list

A dynamic structure, location in memory is not consecutive.

Some operations become $O(1)$, on the expence of others (especially access).


In [3]:
class Node:
    def __init__(self, val):
        self.value = val
        self.next = None
        
    def __repr__(self):
        return str(self.value)

In [4]:
node1 = Node(1)
node2 = Node(2)
node1.next = node2
node3 = Node(3)
node2.next = node3

In [5]:
class LinkedList:
    def __init__(self):
        self.first = None

    def __repr__(self):
       out = "["
       curr = self.first
       while curr:
           out += str(curr) + ", "
           curr = curr.next
       return out[:-2] + "]"

In [6]:
lst = LinkedList()
lst.first = node1
lst


Out[6]:
[1, 2, 3]

Iterating the list, as done is __repr__, can be done by advancing the current node to the next untill reaching a None:

Now we will implement some more methods:


In [7]:
class LinkedList:
    def __init__(self):
        self.first = None

    def __repr__(self):
       out = "["
       curr = self.first
       while curr:
           out += str(curr) + ", "
           curr = curr.next
       return out[:-2] + "]"
        
    def length(self):
        curr = self.first
        i = 0
        while curr:
            i += 1
            curr = curr.next
        return i
            
    def prepend(self, value):
        node = Node(value)
        if not self.first:
            self.first = node
        else:
            self.first, node.next = node, self.first
    
    def append(self, value):
        node = Node(value)
        curr = self.first
        if not curr:
            self.first = node
        else:
            while curr.next:
                curr = curr.next
            curr.next = node

    def insert(self, index, value):
        curr = self.first
        for i in range(index - 1):
            if not curr or not curr.next:
                raise IndexError()
            curr = curr.next
        node = Node(value)
        curr.next, node.next = node, curr.next

    def remove(self, index):
        curr = self.first
        # iterate to the one before the one to be removed
        for i in range(index - 1):
            if not curr or not curr.next:
                raise IndexError
            curr = curr.next
        curr.next = curr.next.next
            
    def get(self, index):
        curr = self.first
        for i in range(index):
            if not curr or not curr.next:
                raise IndexError
            curr = curr.next
        return curr.value
            
    def index(self, value):
        curr = self.first
        i = 0
        while curr:
            if curr.value == value:
                return i
            curr = curr.next
            i += 1
        raise ValueError()

In [8]:
node1 = Node(1)
node2 = Node(2)
node1.next = node2
node3 = Node(3)
node2.next = node3
lst = LinkedList()
lst.first = node1

print(lst)
print(lst.length())
lst.prepend(0)
print(lst)
lst.append(4)
print(lst)
lst.insert(2, 1.5)
print(lst)
lst.remove(2)
print(lst)
print(lst.get(2))
print(lst.index(2))


[1, 2, 3]
3
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 1.5, 2, 3, 4]
[0, 1, 2, 3, 4]
2
2

Reverse

We want to reverse a linked list. We implement this as a mehtod just so we don't copy-paste the entire code from above - this can and should be a method.


In [9]:
def reverse(lst):
    prev = None
    curr = lst.first
    while curr:
        curr.next, prev, curr = prev, curr, curr.next
    lst.first = prev

In [10]:
print(lst)
reverse(lst)
print(lst)


[0, 1, 2, 3, 4]
[4, 3, 2, 1, 0]

Queue

We want to have a first-in-first-out (FIFO) data structure so that we can:

  • push/enqueue: insert an element to the top of the queue
  • pop/dequeue: remove an element from the bottom of the queue

How do our current ordered data structure deal with these tasks?

  • hashtable
  • list
  • linked list

Lets try it.

With a list implementation push and pop must be $O(1)$ and $O(n)$ or vice versa, as append and pop() are $O(1)$ but insert(0) and pop(0) are $O(n)$.

We will implement it so that push is cheap:


In [11]:
class QueueList:
    def __init__(self):
        self.lst = []
    
    def __repr__(self):
        return self.lst.__repr__()

    def push(self, value):
        self.lst.append(Node(value))

    def pop(self):
        return self.lst.pop(0)

q1 = QueueList()
q1.push(1)
q1.push(2)
q1.push(3)
print(q1.pop())
q1


1
Out[11]:
[2, 3]

With a linked list it's the other way around - prepend and remove(0) are $O(1)$ but append and remove(-1) are $O(n)$. We will push to the head so that we get a cheap push:


In [12]:
class QueueLinkedList: 
  def __init__(self): 
    self.head = None 

  def __repr__(self):
        out = "["
        curr = self.head
        while curr:
            out += str(curr) + ", "
            curr = curr.next
        return out[:-2] + "]"

  def push(self, value): 
    node = Node(value)
    if self.head == None:
        self.head = node
    else:
        self.head, node.next = node, self.head
  
  def pop(self):
        if self.head != None:
            curr, prev = self.head, None
            while curr.next:
                curr, prev = curr.next, curr
            prev.next = None
            return curr.value

q2 = QueueLinkedList()
q2.push(1)
q2.push(2)
q2.push(3)
print(q2.pop())
q2


1
Out[12]:
[3, 2]

Lets check performence:


In [13]:
q1 = QueueList()
q2 = QueueLinkedList()
for i in range(10**5): 
    q1.push(1)
    q2.push(1)

In [20]:
%timeit -n 100 q1.push(1)
%timeit -n 100 q2.push(1)


100 loops, best of 3: 1.12 us per loop
100 loops, best of 3: 1.37 us per loop

In [21]:
%timeit -n 100 q1.pop()
%timeit -n 100 q2.pop()


100 loops, best of 3: 68.3 us per loop
100 loops, best of 3: 23.1 ms per loop

So indeed push is cheap and pop is expensive, much more so in the linked list because list is implemented in C.

We want a data structure where both push and pop are $O(1)$.

  • linked list with tail reference!

We will keep track of the head and the tail and so both push and pop can be $O(1)$:


In [14]:
class TailedQueueLinkedList: 
  def __init__(self): 
    self.head   = None 
    self.tail   = None 

  def __repr__(self):
    out = "["
    curr = self.head
    while curr:
        out += str(curr) + ", "
        curr = curr.next
    if len(out) > 2:
        out = out[:-2]
    return out + "]"

  def push(self, value): 
    node = Node(value)
    if self.head == None:
        # empty queue
        self.head, self.tail = node, node
    else:
        # push to the tail. can you think of a way to push from the head? where will you pop from?
        self.tail.next, self.tail = node, node
  
  def pop(self):
        # pop from the head
        if self.head != None:
            node = self.head
            self.head = self.head.next    
            return node.value

q3 = TailedQueueLinkedList()
q3.push(1)
q3.push(2)
q3.push(3)
print(q3.pop())
q3


1
Out[14]:
[2, 3]

In [41]:
q1 = QueueList()
q2 = QueueLinkedList()
q3 = TailedQueueLinkedList()
for i in range(10**5): 
    q1.push(1)
    q2.push(1)
    q3.push(1)

In [42]:
%timeit -n 100 q1.push(1)
%timeit -n 100 q2.push(1)
%timeit -n 100 q3.push(1)


100 loops, best of 3: 1.13 us per loop
100 loops, best of 3: 1.38 us per loop
100 loops, best of 3: 1.37 us per loop

In [43]:
%timeit -n 100 q1.pop()
%timeit -n 100 q2.pop()
%timeit -n 100 q3.pop()


100 loops, best of 3: 68.5 us per loop
100 loops, best of 3: 24.9 ms per loop
100 loops, best of 3: 1.04 us per loop

And indeed the tailed linked list keeps a low time for both push and pop.

Iterators

We want to be able to do:

for element in seq:
    ...

seq must be iterable -> must have a method named __iter__.


In [66]:
class A:
    def __init__(self):
        self.values = [1,2,3,4]

    def __repr__(self):
        return str(self.values)

In [67]:
a = A()
print(a)


[1, 2, 3, 4]

In [68]:
for x in a:
    print(x)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-966e99e76e39> in <module>()
----> 1 for x in a:
      2     print(x)

TypeError: 'A' object is not iterable

In [69]:
class A:
    ''' using the built-in iterator of list '''
    def __init__(self):
        self.values = [1,2,3,4]

    def __repr__(self):
        return str(self.values)

    def __iter__(self): #now A is iterable
        return iter(self.values) 	#returns the iterator class of a list

In [70]:
a = A()
for x in a:
    print(x)


1
2
3
4

We can write our own iterator:


In [75]:
class B:
    ''' using our own iterator class: BIterator '''
    def __init__(self):
        self.values = [1,2,3,4]
        self.counts = [3,4,0,1]

    def __repr__(self):
        return str(self.values) + "\n" + str(self.count)
    
    def __iter__(self):
        return BIterator(self.values, self.counts)

class BIterator():
    ''' iterator of B, must have __next__ '''
    def __init__(self, values, counts):
        self.index = 0
        self.lst = [values[i] for i in range(len(values)) for j in range(counts[i])]
        
    def __next__(self):
        if self.index < len(self.lst):
            res = self.lst[self.index]
            self.index += 1
            return res
        raise StopIteration

In [74]:
b = B()
for x in b:
    print(x, end=", ")


1, 1, 1, 2, 2, 2, 2, 4, 

Generators

Instead a separate class for the iterator, __iter__ can use yield.

This is called a generator (as it generates an iterator):


In [76]:
class C:
    ''' a generator example using yield '''
    
    def __init__(self):
        self.values = [1,2,3,4]
        self.counts = [3,4,0,1]

    def __repr__(self):
        return str(self.values) + "\n" + str(self.count)
    
    def __iter__(self):
        for i in range(len(self.values)):
            for j in range(self.counts[i]):
                yield self.values[i]

In [78]:
c = C()
for x in c:
    print(x, end=", ")


1, 1, 1, 2, 2, 2, 2, 4, 

Generators involve lazy evaluation – one element is created at a time.

range is a generator we already know. Lets compare range with list(range):


In [3]:
for i in list(range(10**8)):
    pass

In [4]:
for i in range(10**8):
    pass

If you want to try this at home, you need to run this while having a process monitor tool open (Windows: Task Manager, Linux: top):

You can also use generators to run an "infinite" for loop.

Suppose you want to iterate the natural numbers:


In [6]:
def natural_numbers():
    n = 0
    while True:
        n += 1
        yield n

In [9]:
for n in natural_numbers():
    if n % 667 == 0 and n**2 % 766 == 0:
        break
print(n)


510922

You can also use "generator compehension" instead of "list comprehension" to save memory and to perform lazy init to save runtime because you know you won't use all of them:


In [21]:
evens_less_than_1e6_list = [x for x in range(1, 10**6) if x % 2 == 0]
%timeit -n 3 [x for x in range(1, 10**6) if x % 2 == 0]


3 loops, best of 3: 637 ms per loop

In [22]:
evens_less_than_1e6_generator = (x for x in range(1, 10**6) if x % 2 == 0)
%timeit -n 3 (x for x in range(1, 10**6) if x % 2 == 0)


3 loops, best of 3: 6.64 us per loop

In [32]:
import sys
sys.getsizeof(evens_less_than_1e6_list), sys.getsizeof(evens_less_than_1e6_generator)


Out[32]:
(4290016, 72)

Fin

This notebook is part of the Extended introduction to computer science course at Tel-Aviv University.

The notebook was written using Python 3.2 and IPython 0.13.1.

The code is available at https://raw.github.com/yoavram/CS1001.py/master/recitation9.ipynb.

The notebook can be viewed online at http://nbviewer.ipython.org/urls/raw.github.com/yoavram/CS1001.py/master/recitation9.ipynb.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.