This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).

Advanced Data Structures

There are four types of collections in Python

(known as "Sequence objects")

list : a mutable ordered array of data
tuple : an immutable ordered array of data
dict : an unordered mapping from keys to values
set : an unordered collection of unique elements

The values in any of these collections can be arbitrary Python objects, and mixing content types is OK.

Note that strings are also sequence objects.

Tuples

Tuples are denoted with parentheses



In [1]:

    
t = (12, -1)
print type(t)









    



<type 'tuple'>



In [2]:

    
print isinstance(t,tuple)
print len(t)









    



True
2

Can mix types in a tuple



In [10]:

    
t = (12, "monty", True, -1.23e6)
print t[1]









    



monty

Indexing works the same way as for strings:



In [4]:

    
print t[-1]



In [5]:

    
t[-2:]  # get the last two elements, return as a tuple









    Out[5]:





(True, -1230000.0)



In [6]:

    
x = (True) ; print type(x)
x = (True,) ; print type(x)









    



<type 'bool'>
<type 'tuple'>



In [8]:

    
x = ()
type(x), len(x)









    Out[8]:





(tuple, 0)



In [9]:

    
x = (,)









    



  File "<ipython-input-9-bd05d59e2976>", line 1
    x = (,)
         ^
SyntaxError: invalid syntax

single-element tuples look like (element,)

tuples cannot be modified. but you can create new one with concatenation



In [9]:

    
t[2] = False









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-9-9365ccccf007> in <module>()
----> 1 t[2] = False

TypeError: 'tuple' object does not support item assignment



In [5]:

    
t[0:2], False, t[3:]









    Out[5]:





((), False, ())



In [6]:

    
## the above it 
## not what we wanted... need to concatenate
t[0:2] + False + t[3:]









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-73d4c94ec2bf> in <module>()
      1 ## the above it
      2 ## not what we wanted... need to concatenate
----> 3 t[0:2] + False + t[3:]

TypeError: can only concatenate tuple (not "bool") to tuple



In [7]:

    
y = t[0:2] + (False,) + t[3:]
print y









    



(False,)



In [13]:

    
t * 2









    Out[13]:





(12, 'monty', True, -1230000.0, 12, 'monty', True, -1230000.0)

Tuples are most commonly used in functions which return multiple arguments.

Lists

Lists are denoted with square brackets



In [34]:

    
v = [1,2,3]
print len(v)
print type(v)









    



3
<type 'list'>



In [12]:

    
v[0:2]









    Out[12]:





[1, 2]



In [16]:

    
v = ["eggs", "spam", -1, ("monty","python"), [-1.2,-3.5]]
len(v)









    Out[16]:





5



In [17]:

    
v[0] ="green egg"
v[1] += ",love it."
v[-1]









    Out[17]:





[-1.2, -3.5]



In [18]:

    
v[-1][1] = None
print v









    



['green egg', 'spam,love it.', -1, ('monty', 'python'), [-1.2, None]]



In [19]:

    
v = v[2:]
print v









    



[-1, ('monty', 'python'), [-1.2, None]]



In [14]:

    
# let's make a proto-array out of nested lists
vv = [ [1,2], [3,4] ]



In [15]:

    
print len(vv)



In [16]:

    
determinant = vv[0][0] * vv[1][1] - vv[0][1] * vv[1][0]
print determinant

-2

the main point here: lists are mutable

Lists can be Extended and Appended



In [24]:

    
v = [1,2,3]
v.append(4)
v.append([-5])
print v









    



[1, 2, 3, 4, [-5]]

Note: lists can be considered objects. Objects are collections of data and associated methods. In the case of a list, append is a method: it is a function associated with the object.



In [25]:

    
v = v[:4]
w = ['elderberries', 'eggs']
print v + w









    



[1, 2, 3, 4, 'elderberries', 'eggs']



In [26]:

    
print v









    



[1, 2, 3, 4]



In [27]:

    
v.extend(w)
print v









    



[1, 2, 3, 4, 'elderberries', 'eggs']



In [28]:

    
v.pop()









    Out[28]:





'eggs'



In [29]:

    
print v









    



[1, 2, 3, 4, 'elderberries']



In [30]:

    
v.pop(0) ## pop the first element









    Out[30]:





1



In [31]:

    
print v









    



[2, 3, 4, 'elderberries']

Useful list methods:

.append(): adds a new element
.extend(): concatenates a list/element
.pop(): remove an element

Lists can be searched, sorted, & counted



In [32]:

    
v = [1, 3, 2, 3, 4]
v.sort() ; print v









    



[1, 2, 3, 3, 4]

reverse is a keyword of the .sort() method



In [33]:

    
v.sort(reverse=True) ; print v









    



[4, 3, 3, 2, 1]

.sort() changes the the list in place



In [31]:

    
v.index(4)   ## lookup the index of the entry 4









    Out[31]:





1



In [32]:

    
v.index(3)









    Out[32]:





2



In [33]:

    
v.count(3)









    Out[33]:





2



In [34]:

    
v.insert(0, "it's full of stars") ; print v









    



["it's full of stars", 'elderberries', 4, 3, 3, 2, 1]



In [35]:

    
v.remove(1) ; print v









    



["it's full of stars", 'elderberries', 4, 3, 3, 2]

Using IPython to learn more

IPython is your new best friend: it's tab-completion allows you to explore all methods available to an object. (This only works in IPython)

Type

v.

and then the tab key to see all the available methods:



In [ ]:

    
v.

Once you find a method, type (for example)

v.index?

and press shift-enter: you'll see the documentation of the method



In [ ]:

    
v.index?

This is probably the most important thing you'll learn today

Iterating over Lists



In [37]:

    
a = ['cat', 'window', 'defenestrate']
for x in a:
    print x, len(x)









    



cat 3
window 6
defenestrate 12



In [38]:

    
for i,x in enumerate(a):
    print i, x, len(x)









    



0 cat 3
1 window 6
2 defenestrate 12



In [39]:

    
for x in a:
    print x,









    



cat window defenestrate

The syntax for iteration is...

for variable_name in iterable:
   # do something with variable_name

The `range()` function

The range() function creates a list of integers



In [37]:

    
x = range(4)
print x









    



[0, 1, 2, 3]



In [38]:

    
total = 0
for val in range(4):
    total += val
    print "By adding " + str(val) + \
          " the total is now " + str(total)









    



By adding 0 the total is now 0
By adding 1 the total is now 1
By adding 2 the total is now 3
By adding 3 the total is now 6

range([start,] stop[, step]) → list of integers



In [39]:

    
total = 0
for val in range(1, 10, 2):
    total += val
    print "By adding " + str(val) + \
          " the total is now " + str(total)









    



By adding 1 the total is now 1
By adding 3 the total is now 4
By adding 5 the total is now 9
By adding 7 the total is now 16
By adding 9 the total is now 25



In [43]:

    
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
    print i, a[i]









    



0 Mary
1 had
2 a
3 little
4 lamb

Quick Exercise:

Write a loop over the words in this list and print the words longer than three characters in length:



In [ ]:

    
L = ["Oh", "Say", "does", "that", "star",
     "spangled", "banner", "yet", "wave"]

Sets

Sets are denoted with a curly braces



In [45]:

    
{1,2,3,"bingo"}









    Out[45]:





set(['bingo', 1, 2, 3])



In [47]:

    
print type({1,2,3,"bingo"})









    



<type 'set'>



In [48]:

    
print  type({})









    



<type 'dict'>



In [50]:

    
print type(set())









    



<type 'set'>



In [51]:

    
set("spamIam")









    Out[51]:





set(['a', 'p', 's', 'm', 'I'])

sets have unique elements. They can be compared, differenced, unionized, etc.



In [52]:

    
a = set("sp"); b = set("am"); print a ; print b









    



set(['p', 's'])
set(['a', 'm'])



In [53]:

    
c = set(["a","m"])
c == b









    Out[53]:





True



In [54]:

    
"p" in a









    Out[54]:





True



In [55]:

    
"ps" in a









    Out[55]:





False



In [57]:

    
q = set("spamIam")
a.issubset(q)









    Out[57]:





True



In [58]:

    
a | b









    Out[58]:





set(['a', 'p', 's', 'm'])



In [59]:

    
q - (a | b)









    Out[59]:





set(['I'])



In [60]:

    
q & (a | b)









    Out[60]:





set(['a', 'p', 's', 'm'])

Like lists, we can use as (unordered) buckets .pop() gives us elements in an unspecified order:



In [62]:

    
# this is pretty volitile...wont be the same
# order on all machines
for i in q & (a | b):
    print i,









    



a p s m



In [63]:

    
q.remove("a")



In [64]:

    
q.pop()









    Out[64]:





'p'



In [65]:

    
print q.pop()
print q.pop()

s
m



In [66]:

    
print q.pop()



In [68]:

    
q.pop()









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-68-16da542f89c5> in <module>()
----> 1 q.pop()

KeyError: 'pop from an empty set'

Dictionaries

We'll show four ways to make a Dictionary



In [1]:

    
# number 1... curly braces & colons
d = {"favorite cat": None,
     "favorite spam": "all"}
print d









    



{'favorite spam': 'all', 'favorite cat': None}



In [70]:

    
# number 2
d = dict(one = 1, two=2, cat='dog')
print d









    



{'cat': 'dog', 'two': 2, 'one': 1}



In [71]:

    
# number 3 ... just start filling in items/keys
d = {}  # empty dictionary
d['cat'] = 'dog'
d['one'] = 1
d['two'] = 2
print d









    Out[71]:





{'cat': 'dog', 'one': 1, 'two': 2}



In [72]:

    
# number 4... start with a list of tuples
mylist = [("cat","dog"), ("one",1),("two",2)]
print dict(mylist)









    



{'one': 1, 'two': 2, 'cat': 'dog'}



In [73]:

    
dict(mylist) == d









    Out[73]:





True

Dictionaries can be complicated (in a good way)

Note that there is no guaranteed order in a dictionary!



In [75]:

    
d = {"favorite cat": None, "favorite spam": "all"}



In [43]:

    
print d[0]  # this breaks!  Dictionaries have no order









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-43-21ddf1fce1c6> in <module>()
----> 1 print d[0]  # this breaks!  Dictionaries have no order

KeyError: 0



In [45]:

    
d["favorite spam"]









    Out[45]:





'all'



In [2]:

    
d[0] = "this is a zero"
print d









    



{0: 'this is a zero', 'favorite spam': 'all', 'favorite cat': None}

Dictionaries can contain dictionaries!



In [46]:

    
d = {'favorites': {'cat': None, 'spam': 'all'},\
     'least favorite': {'cat': 'all', 'spam': None}}
print d['least favorite']['cat']

all

remember: the backslash ('\') allows you to across break lines. Not technically needed when defining a dictionary or list



In [47]:

    
phone_numbers = {'family': [('mom','642-2322'),('dad','534-2311')],
                 'friends': [('Billy','652-2212')]}



In [80]:

    
for group_type in ['friends', 'family']:
    print "Group " + group_type + ":"
    
    for info in phone_numbers[group_type]:
        print " ",info[0], info[1]









    



Group friends:
  Billy 652-2212
Group family:
  mom 642-2322
  dad 534-2311



In [81]:

    
# this will return a list, but you dont know in what order! 
phone_numbers.keys()









    Out[81]:





['friends', 'family']



In [82]:

    
phone_numbers.values()









    Out[82]:





[[('Billy', '652-2212')], [('mom', '642-2322'), ('dad', '534-2311')]]

.keys() and .values(): are methods on dictionaries (just like .append() and .extend() on lists)



In [83]:

    
for group_type in phone_numbers.keys():
    print "Group " + group_type + ":"
    for info in phone_numbers[group_type]:
        print " ",info[0], info[1]









    



Group friends:
  Billy 652-2212
Group family:
  mom 642-2322
  dad 534-2311

we cannot ensure ordering here of the groups



In [84]:

    
groups = phone_numbers.keys()
groups.sort()
for group_type in groups:
    print "Group " + group_type + ":"
    for info in phone_numbers[group_type]:
        print " ",info[0], info[1]









    



Group family:
  mom 642-2322
  dad 534-2311
Group friends:
  Billy 652-2212

.iteritems() is a handy method, returning key,value pairs with each iteration



In [48]:

    
for group_type, vals in phone_numbers.iteritems():
    print "Group " + group_type + ":"
    for info in vals:
        print " ",info[0], info[1]









    



Group friends:
  Billy 652-2212
Group family:
  mom 642-2322
  dad 534-2311

Some examples of getting values:



In [86]:

    
phone_numbers['co-workers']









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-86-92d1a5b9b960> in <module>()
----> 1 phone_numbers['co-workers']

KeyError: 'co-workers'



In [87]:

    
phone_numbers.has_key('co-workers')









    Out[87]:





False



In [88]:

    
print phone_numbers.get('co-workers')









    



None



In [50]:

    
phone_numbers.get('friends') == phone_numbers['friends']









    Out[50]:





True



In [90]:

    
print phone_numbers.get('co-workers', "all alone")









    



all alone

Setting values

you can edit the values of keys and also .pop() & del to remove certain keys



In [91]:

    
# add to the friends list
phone_numbers['friends'].append(("Marsha","232-1121"))
print phone_numbers









    



{'friends': [('Billy', '652-2212'), ('Marsha', '232-1121')], 'family': [('mom', '642-2322'), ('dad', '534-2311')]}



In [92]:

    
## billy's number changed
phone_numbers['friends'][0][1] = "532-1521"









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-92-564c1535cc4d> in <module>()
      1 ## billy's number changed

----> 2 phone_numbers['friends'][0][1] = "532-1521"

TypeError: 'tuple' object does not support item assignment



In [93]:

    
phone_numbers['friends'][0] = ("Billy", "532-1521")



In [94]:

    
## I lost all my friends preparing for this Python class
phone_numbers['friends'] = [] # sets this to an empty list



In [95]:

    
## remove the friends key altogether
print phone_numbers.pop('friends')

[]



In [96]:

    
print phone_numbers









    



{'family': [('mom', '642-2322'), ('dad', '534-2311')]}



In [97]:

    
del phone_numbers['family']



In [98]:

    
print phone_numbers

{}

.update() method is very handy, like .append() for lists



In [99]:

    
phone_numbers.update({"friends":
                      [("Billy's Brother, Bob", "532-1521")]})
print phone_numbers









    



{'friends': [("Billy's Brother, Bob", '532-1521')]}

Dictionaries are used everywhere within Python...



In [2]:

    
# globals() and locals() store all global and local variables
print globals().keys()[:6]









    



['_dh', '__', 'help', 'quit', '__builtins__', '_ih']

List Comprehensions

A pythonic way of creating lists on-the-fly

Example: imagine you want a list of all numbers from 0 to 100 which are divisible by 7 or 11



In [69]:

    
L = []
for num in range(100):
    if (num % 7 == 0) or (num % 11 == 0):
        L.append(num)
print L









    



[0, 7, 11, 14, 21, 22, 28, 33, 35, 42, 44, 49, 55, 56, 63, 66, 70, 77, 84, 88, 91, 98, 99]

We can also do this with a list comprehension:



In [68]:

    
L = [num for num in range(100)\
     if (num % 7 == 0) or (num % 11 == 0)]
print L









    



[0, 7, 11, 14, 21, 22, 28, 33, 35, 42, 44, 49, 55, 56, 63, 66, 70, 77, 84, 88, 91, 98, 99]



In [70]:

    
# Can also operate on each element:
L = [2 * num for num in range(100)\
     if (num % 7 == 0) or (num % 11 == 0)]
print L









    



[0, 14, 22, 28, 42, 44, 56, 66, 70, 84, 88, 98, 110, 112, 126, 132, 140, 154, 168, 176, 182, 196, 198]

Example: Below is a list of information on 50 of the largest near-earth asteroids. Given this list of asteroid information, let's find all asteroids with semi-major axis within 0.2AU of earth, and with eccentricities less than 0.5



In [82]:

    
# Each element is (name, semi-major axis (AU), eccentricity, orbit class)
# source: http://ssd.jpl.nasa.gov/sbdb_query.cgi

Asteroids = [('Eros', 1.457916888347732, 0.2226769029627053, 'AMO'),
             ('Albert', 2.629584157344544, 0.551788195302116, 'AMO'),
             ('Alinda', 2.477642943521562, 0.5675993715753302, 'AMO'),
             ('Ganymed', 2.662242764279804, 0.5339300994578989, 'AMO'),
             ('Amor', 1.918987277620309, 0.4354863345648127, 'AMO'),
             ('Icarus', 1.077941311539208, 0.826950446001521, 'APO'),
             ('Betulia', 2.196489260519891, 0.4876246891992282, 'AMO'),
             ('Geographos', 1.245477192797457, 0.3355407124897842, 'APO'),
             ('Ivar', 1.862724540418448, 0.3968541470639658, 'AMO'),
             ('Toro', 1.367247622946547, 0.4358829575017499, 'APO'),
             ('Apollo', 1.470694262588244, 0.5598306817483757, 'APO'),
             ('Antinous', 2.258479598510079, 0.6070051516585434, 'APO'),
             ('Daedalus', 1.460912865705988, 0.6144629118218898, 'APO'),
             ('Cerberus', 1.079965807367047, 0.4668134997419173, 'APO'),
             ('Sisyphus', 1.893726635847921, 0.5383319204425762, 'APO'),
             ('Quetzalcoatl', 2.544270656955212, 0.5704591861565643, 'AMO'),
             ('Boreas', 2.271958775354725, 0.4499332278634067, 'AMO'),
             ('Cuyo', 2.150453953345012, 0.5041719257675564, 'AMO'),
             ('Anteros', 1.430262719980132, 0.2558054402785934, 'AMO'),
             ('Tezcatlipoca', 1.709753263222791, 0.3647772103513082, 'AMO'),
             ('Midas', 1.775954494579457, 0.6503697243919138, 'APO'),
             ('Baboquivari', 2.646202507670927, 0.5295611095751231, 'AMO'),
             ('Anza', 2.26415089613359, 0.5371603112900858, 'AMO'),
             ('Aten', 0.9668828078092987, 0.1827831025175614, 'ATE'),
             ('Bacchus', 1.078135348117527, 0.3495569270441645, 'APO'),
             ('Ra-Shalom', 0.8320425524852308, 0.4364726062545577, 'ATE'),
             ('Adonis', 1.874315684524321, 0.763949321566, 'APO'),
             ('Tantalus', 1.289997492877751, 0.2990853014998932, 'APO'),
             ('Aristaeus', 1.599511990737142, 0.5030618532252225, 'APO'),
             ('Oljato', 2.172056090036035, 0.7125729402616418, 'APO'),
             ('Pele', 2.291471988746353, 0.5115484924883255, 'AMO'),
             ('Hephaistos', 2.159619960333728, 0.8374146846143349, 'APO'),
             ('Orthos', 2.404988778495748, 0.6569133796135244, 'APO'),
             ('Hathor', 0.8442121506103012, 0.4498204013480316, 'ATE'),
             ('Beltrovata', 2.104690977122337, 0.413731105995413, 'AMO'),
             ('Seneca', 2.516402574514213, 0.5708728441169761, 'AMO'),
             ('Krok', 2.152545170235639, 0.4478259793515817, 'AMO'),
             ('Eger', 1.404478323548423, 0.3542971360331806, 'APO'),
             ('Florence', 1.768227407864309, 0.4227761019048867, 'AMO'),
             ('Nefertiti', 1.574493139339916, 0.283902719273878, 'AMO'),
             ('Phaethon', 1.271195939723604, 0.8898716672181355, 'APO'),
             ('Ul', 2.102493486378346, 0.3951143067760007, 'AMO'),
             ('Seleucus', 2.033331705805067, 0.4559159977082651, 'AMO'),
             ('McAuliffe', 1.878722427225527, 0.3691521497610656, 'AMO'),
             ('Syrinx', 2.469752836845105, 0.7441934504192601, 'APO'),
             ('Orpheus', 1.209727780883745, 0.3229034563257626, 'APO'),
             ('Khufu', 0.989473784873371, 0.468479627898914, 'ATE'),
             ('Verenia', 2.093231870619781, 0.4865133359612604, 'AMO'),
             ('"Don Quixote"', 4.221712367193639, 0.7130894892477316, 'AMO'),
             ('Mera', 1.644476057737928, 0.3201425983025733, 'AMO')]

orbit_class = {'AMO':'Amor', 'APO':'Apollo', 'ATE':'Aten'}



In [83]:

    
# first we'll build the list using loops.
L = []
for data in Asteroids:
    name, a, e, t = data
    if abs(a - 1) < 0.2 and e < 0.5:
        L.append(name)
print L









    



['Cerberus', 'Aten', 'Bacchus', 'Ra-Shalom', 'Hathor', 'Khufu']



In [73]:

    
# now with a list comprehension...
L = [name for (name, a, e, t) in Asteroids
     if abs(a - 1) < 0.2 and e < 0.5]
print L









    



['Cerberus', 'Aten', 'Bacchus', 'Ra-Shalom']

Here is how we could create a dictionary from the list



In [81]:

    
D = dict([(name, (a, e, t)) for (name, a, e, t) in Asteroids])
print D['Eros']
print D['Amor']









    



(1.457837275957122, 0.2226677322184548)
(1.918872951565462, 0.4355437855993322)

Breakout #2: Sorting Asteroids

Using the above Asteroid list,

print the list sorted in alphabetical order by asteroid name (hint: how does sorting handle a list of tuples?)
print the list sorted by semi-major axis
print the list sorted by name, but replace the class code with the class name

The output should be formatted like this:

Asteroid name    a (AU)    e        class
-----------------------------------------
Eros             1.4578    0.2226   Amor
Albert           2.6292    0.5518   Amor
.
.
.

Bonus points if you can get the columns to line up nicely!