This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).

Advanced Data Structures

There are four types of collections in Python

(known as "Sequence objects")

  • list : a mutable ordered array of data
  • tuple : an immutable ordered array of data
  • dict : an unordered mapping from keys to values
  • set : an unordered collection of unique elements

The values in any of these collections can be arbitrary Python objects, and mixing content types is OK.

Note that strings are also sequence objects.

Tuples

Tuples are denoted with parentheses


In [1]:
t = (12, -1)
print type(t)


<type 'tuple'>

In [2]:
print isinstance(t,tuple)
print len(t)


True
2

Can mix types in a tuple


In [10]:
t = (12, "monty", True, -1.23e6)
print t[1]


monty

Indexing works the same way as for strings:


In [4]:
print t[-1]


-1230000.0

In [5]:
t[-2:]  # get the last two elements, return as a tuple


Out[5]:
(True, -1230000.0)

In [6]:
x = (True) ; print type(x)
x = (True,) ; print type(x)


<type 'bool'>
<type 'tuple'>

In [8]:
x = ()
type(x), len(x)


Out[8]:
(tuple, 0)

In [9]:
x = (,)


  File "<ipython-input-9-bd05d59e2976>", line 1
    x = (,)
         ^
SyntaxError: invalid syntax

single-element tuples look like (element,)

tuples cannot be modified. but you can create new one with concatenation


In [9]:
t[2] = False


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-9-9365ccccf007> in <module>()
----> 1 t[2] = False

TypeError: 'tuple' object does not support item assignment

In [5]:
t[0:2], False, t[3:]


Out[5]:
((), False, ())

In [6]:
## the above it 
## not what we wanted... need to concatenate
t[0:2] + False + t[3:]


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-73d4c94ec2bf> in <module>()
      1 ## the above it
      2 ## not what we wanted... need to concatenate
----> 3 t[0:2] + False + t[3:]

TypeError: can only concatenate tuple (not "bool") to tuple

In [7]:
y = t[0:2] + (False,) + t[3:]
print y


(False,)

In [13]:
t * 2


Out[13]:
(12, 'monty', True, -1230000.0, 12, 'monty', True, -1230000.0)

Tuples are most commonly used in functions which return multiple arguments.

Lists

Lists are denoted with square brackets


In [34]:
v = [1,2,3]
print len(v)
print type(v)


3
<type 'list'>

In [12]:
v[0:2]


Out[12]:
[1, 2]

In [16]:
v = ["eggs", "spam", -1, ("monty","python"), [-1.2,-3.5]]
len(v)


Out[16]:
5

In [17]:
v[0] ="green egg"
v[1] += ",love it."
v[-1]


Out[17]:
[-1.2, -3.5]

In [18]:
v[-1][1] = None
print v


['green egg', 'spam,love it.', -1, ('monty', 'python'), [-1.2, None]]

In [19]:
v = v[2:]
print v


[-1, ('monty', 'python'), [-1.2, None]]

In [14]:
# let's make a proto-array out of nested lists
vv = [ [1,2], [3,4] ]

In [15]:
print len(vv)


2

In [16]:
determinant = vv[0][0] * vv[1][1] - vv[0][1] * vv[1][0]
print determinant


-2

the main point here: lists are mutable

Lists can be Extended and Appended


In [24]:
v = [1,2,3]
v.append(4)
v.append([-5])
print v


[1, 2, 3, 4, [-5]]

Note: lists can be considered objects. Objects are collections of data and associated methods. In the case of a list, append is a method: it is a function associated with the object.


In [25]:
v = v[:4]
w = ['elderberries', 'eggs']
print v + w


[1, 2, 3, 4, 'elderberries', 'eggs']

In [26]:
print v


[1, 2, 3, 4]

In [27]:
v.extend(w)
print v


[1, 2, 3, 4, 'elderberries', 'eggs']

In [28]:
v.pop()


Out[28]:
'eggs'

In [29]:
print v


[1, 2, 3, 4, 'elderberries']

In [30]:
v.pop(0) ## pop the first element


Out[30]:
1

In [31]:
print v


[2, 3, 4, 'elderberries']

Useful list methods:

  • .append(): adds a new element
  • .extend(): concatenates a list/element
  • .pop(): remove an element

Lists can be searched, sorted, & counted


In [32]:
v = [1, 3, 2, 3, 4]
v.sort() ; print v


[1, 2, 3, 3, 4]

reverse is a keyword of the .sort() method


In [33]:
v.sort(reverse=True) ; print v


[4, 3, 3, 2, 1]

.sort() changes the the list in place


In [31]:
v.index(4)   ## lookup the index of the entry 4


Out[31]:
1

In [32]:
v.index(3)


Out[32]:
2

In [33]:
v.count(3)


Out[33]:
2

In [34]:
v.insert(0, "it's full of stars") ; print v


["it's full of stars", 'elderberries', 4, 3, 3, 2, 1]

In [35]:
v.remove(1) ; print v


["it's full of stars", 'elderberries', 4, 3, 3, 2]

Using IPython to learn more

IPython is your new best friend: it's tab-completion allows you to explore all methods available to an object. (This only works in IPython)

Type

v.

and then the tab key to see all the available methods:


In [ ]:
v.

Once you find a method, type (for example)

v.index?

and press shift-enter: you'll see the documentation of the method


In [ ]:
v.index?

This is probably the most important thing you'll learn today

Iterating over Lists


In [37]:
a = ['cat', 'window', 'defenestrate']
for x in a:
    print x, len(x)


cat 3
window 6
defenestrate 12

In [38]:
for i,x in enumerate(a):
    print i, x, len(x)


0 cat 3
1 window 6
2 defenestrate 12

In [39]:
for x in a:
    print x,


cat window defenestrate

The syntax for iteration is...

for variable_name in iterable:
   # do something with variable_name

The range() function

The range() function creates a list of integers


In [37]:
x = range(4)
print x


[0, 1, 2, 3]

In [38]:
total = 0
for val in range(4):
    total += val
    print "By adding " + str(val) + \
          " the total is now " + str(total)


By adding 0 the total is now 0
By adding 1 the total is now 1
By adding 2 the total is now 3
By adding 3 the total is now 6

range([start,] stop[, step]) → list of integers


In [39]:
total = 0
for val in range(1, 10, 2):
    total += val
    print "By adding " + str(val) + \
          " the total is now " + str(total)


By adding 1 the total is now 1
By adding 3 the total is now 4
By adding 5 the total is now 9
By adding 7 the total is now 16
By adding 9 the total is now 25

In [43]:
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
    print i, a[i]


0 Mary
1 had
2 a
3 little
4 lamb

Quick Exercise:

Write a loop over the words in this list and print the words longer than three characters in length:


In [ ]:
L = ["Oh", "Say", "does", "that", "star",
     "spangled", "banner", "yet", "wave"]

Sets

Sets are denoted with a curly braces


In [45]:
{1,2,3,"bingo"}


Out[45]:
set(['bingo', 1, 2, 3])

In [47]:
print type({1,2,3,"bingo"})


<type 'set'>

In [48]:
print  type({})


<type 'dict'>

In [50]:
print type(set())


<type 'set'>

In [51]:
set("spamIam")


Out[51]:
set(['a', 'p', 's', 'm', 'I'])

sets have unique elements. They can be compared, differenced, unionized, etc.


In [52]:
a = set("sp"); b = set("am"); print a ; print b


set(['p', 's'])
set(['a', 'm'])

In [53]:
c = set(["a","m"])
c == b


Out[53]:
True

In [54]:
"p" in a


Out[54]:
True

In [55]:
"ps" in a


Out[55]:
False

In [57]:
q = set("spamIam")
a.issubset(q)


Out[57]:
True

In [58]:
a | b


Out[58]:
set(['a', 'p', 's', 'm'])

In [59]:
q - (a | b)


Out[59]:
set(['I'])

In [60]:
q & (a | b)


Out[60]:
set(['a', 'p', 's', 'm'])

Like lists, we can use as (unordered) buckets .pop() gives us elements in an unspecified order:


In [62]:
# this is pretty volitile...wont be the same
# order on all machines
for i in q & (a | b):
    print i,


a p s m

In [63]:
q.remove("a")

In [64]:
q.pop()


Out[64]:
'p'

In [65]:
print q.pop()
print q.pop()


s
m

In [66]:
print q.pop()


I

In [68]:
q.pop()


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-68-16da542f89c5> in <module>()
----> 1 q.pop()

KeyError: 'pop from an empty set'

Dictionaries

We'll show four ways to make a Dictionary


In [1]:
# number 1... curly braces & colons
d = {"favorite cat": None,
     "favorite spam": "all"}
print d


{'favorite spam': 'all', 'favorite cat': None}

In [70]:
# number 2
d = dict(one = 1, two=2, cat='dog')
print d


{'cat': 'dog', 'two': 2, 'one': 1}

In [71]:
# number 3 ... just start filling in items/keys
d = {}  # empty dictionary
d['cat'] = 'dog'
d['one'] = 1
d['two'] = 2
print d


Out[71]:
{'cat': 'dog', 'one': 1, 'two': 2}

In [72]:
# number 4... start with a list of tuples
mylist = [("cat","dog"), ("one",1),("two",2)]
print dict(mylist)


{'one': 1, 'two': 2, 'cat': 'dog'}

In [73]:
dict(mylist) == d


Out[73]:
True

Dictionaries can be complicated (in a good way)

Note that there is no guaranteed order in a dictionary!


In [75]:
d = {"favorite cat": None, "favorite spam": "all"}

In [43]:
print d[0]  # this breaks!  Dictionaries have no order


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-43-21ddf1fce1c6> in <module>()
----> 1 print d[0]  # this breaks!  Dictionaries have no order

KeyError: 0

In [45]:
d["favorite spam"]


Out[45]:
'all'

In [2]:
d[0] = "this is a zero"
print d


{0: 'this is a zero', 'favorite spam': 'all', 'favorite cat': None}

Dictionaries can contain dictionaries!


In [46]:
d = {'favorites': {'cat': None, 'spam': 'all'},\
     'least favorite': {'cat': 'all', 'spam': None}}
print d['least favorite']['cat']


all

remember: the backslash ('\') allows you to across break lines. Not technically needed when defining a dictionary or list


In [47]:
phone_numbers = {'family': [('mom','642-2322'),('dad','534-2311')],
                 'friends': [('Billy','652-2212')]}

In [80]:
for group_type in ['friends', 'family']:
    print "Group " + group_type + ":"
    
    for info in phone_numbers[group_type]:
        print " ",info[0], info[1]


Group friends:
  Billy 652-2212
Group family:
  mom 642-2322
  dad 534-2311

In [81]:
# this will return a list, but you dont know in what order! 
phone_numbers.keys()


Out[81]:
['friends', 'family']

In [82]:
phone_numbers.values()


Out[82]:
[[('Billy', '652-2212')], [('mom', '642-2322'), ('dad', '534-2311')]]

.keys() and .values(): are methods on dictionaries (just like .append() and .extend() on lists)


In [83]:
for group_type in phone_numbers.keys():
    print "Group " + group_type + ":"
    for info in phone_numbers[group_type]:
        print " ",info[0], info[1]


Group friends:
  Billy 652-2212
Group family:
  mom 642-2322
  dad 534-2311

we cannot ensure ordering here of the groups


In [84]:
groups = phone_numbers.keys()
groups.sort()
for group_type in groups:
    print "Group " + group_type + ":"
    for info in phone_numbers[group_type]:
        print " ",info[0], info[1]


Group family:
  mom 642-2322
  dad 534-2311
Group friends:
  Billy 652-2212

.iteritems() is a handy method, returning key,value pairs with each iteration


In [48]:
for group_type, vals in phone_numbers.iteritems():
    print "Group " + group_type + ":"
    for info in vals:
        print " ",info[0], info[1]


Group friends:
  Billy 652-2212
Group family:
  mom 642-2322
  dad 534-2311

Some examples of getting values:


In [86]:
phone_numbers['co-workers']


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-86-92d1a5b9b960> in <module>()
----> 1 phone_numbers['co-workers']

KeyError: 'co-workers'

In [87]:
phone_numbers.has_key('co-workers')


Out[87]:
False

In [88]:
print phone_numbers.get('co-workers')


None

In [50]:
phone_numbers.get('friends') == phone_numbers['friends']


Out[50]:
True

In [90]:
print phone_numbers.get('co-workers', "all alone")


all alone

Setting values

you can edit the values of keys and also .pop() & del to remove certain keys


In [91]:
# add to the friends list
phone_numbers['friends'].append(("Marsha","232-1121"))
print phone_numbers


{'friends': [('Billy', '652-2212'), ('Marsha', '232-1121')], 'family': [('mom', '642-2322'), ('dad', '534-2311')]}

In [92]:
## billy's number changed
phone_numbers['friends'][0][1] = "532-1521"


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/jbloom/Classes/python-bootcamp/DataFiles_and_Notebooks/02_AdvancedDataStructures/<ipython-input-92-564c1535cc4d> in <module>()
      1 ## billy's number changed

----> 2 phone_numbers['friends'][0][1] = "532-1521"

TypeError: 'tuple' object does not support item assignment

In [93]:
phone_numbers['friends'][0] = ("Billy", "532-1521")

In [94]:
## I lost all my friends preparing for this Python class
phone_numbers['friends'] = [] # sets this to an empty list

In [95]:
## remove the friends key altogether
print phone_numbers.pop('friends')


[]

In [96]:
print phone_numbers


{'family': [('mom', '642-2322'), ('dad', '534-2311')]}

In [97]:
del phone_numbers['family']

In [98]:
print phone_numbers


{}

.update() method is very handy, like .append() for lists


In [99]:
phone_numbers.update({"friends":
                      [("Billy's Brother, Bob", "532-1521")]})
print phone_numbers


{'friends': [("Billy's Brother, Bob", '532-1521')]}

Dictionaries are used everywhere within Python...


In [2]:
# globals() and locals() store all global and local variables
print globals().keys()[:6]


['_dh', '__', 'help', 'quit', '__builtins__', '_ih']

List Comprehensions

A pythonic way of creating lists on-the-fly

Example: imagine you want a list of all numbers from 0 to 100 which are divisible by 7 or 11


In [69]:
L = []
for num in range(100):
    if (num % 7 == 0) or (num % 11 == 0):
        L.append(num)
print L


[0, 7, 11, 14, 21, 22, 28, 33, 35, 42, 44, 49, 55, 56, 63, 66, 70, 77, 84, 88, 91, 98, 99]

We can also do this with a list comprehension:


In [68]:
L = [num for num in range(100)\
     if (num % 7 == 0) or (num % 11 == 0)]
print L


[0, 7, 11, 14, 21, 22, 28, 33, 35, 42, 44, 49, 55, 56, 63, 66, 70, 77, 84, 88, 91, 98, 99]

In [70]:
# Can also operate on each element:
L = [2 * num for num in range(100)\
     if (num % 7 == 0) or (num % 11 == 0)]
print L


[0, 14, 22, 28, 42, 44, 56, 66, 70, 84, 88, 98, 110, 112, 126, 132, 140, 154, 168, 176, 182, 196, 198]

Example: Below is a list of information on 50 of the largest near-earth asteroids. Given this list of asteroid information, let's find all asteroids with semi-major axis within 0.2AU of earth, and with eccentricities less than 0.5


In [82]:
# Each element is (name, semi-major axis (AU), eccentricity, orbit class)
# source: http://ssd.jpl.nasa.gov/sbdb_query.cgi

Asteroids = [('Eros', 1.457916888347732, 0.2226769029627053, 'AMO'),
             ('Albert', 2.629584157344544, 0.551788195302116, 'AMO'),
             ('Alinda', 2.477642943521562, 0.5675993715753302, 'AMO'),
             ('Ganymed', 2.662242764279804, 0.5339300994578989, 'AMO'),
             ('Amor', 1.918987277620309, 0.4354863345648127, 'AMO'),
             ('Icarus', 1.077941311539208, 0.826950446001521, 'APO'),
             ('Betulia', 2.196489260519891, 0.4876246891992282, 'AMO'),
             ('Geographos', 1.245477192797457, 0.3355407124897842, 'APO'),
             ('Ivar', 1.862724540418448, 0.3968541470639658, 'AMO'),
             ('Toro', 1.367247622946547, 0.4358829575017499, 'APO'),
             ('Apollo', 1.470694262588244, 0.5598306817483757, 'APO'),
             ('Antinous', 2.258479598510079, 0.6070051516585434, 'APO'),
             ('Daedalus', 1.460912865705988, 0.6144629118218898, 'APO'),
             ('Cerberus', 1.079965807367047, 0.4668134997419173, 'APO'),
             ('Sisyphus', 1.893726635847921, 0.5383319204425762, 'APO'),
             ('Quetzalcoatl', 2.544270656955212, 0.5704591861565643, 'AMO'),
             ('Boreas', 2.271958775354725, 0.4499332278634067, 'AMO'),
             ('Cuyo', 2.150453953345012, 0.5041719257675564, 'AMO'),
             ('Anteros', 1.430262719980132, 0.2558054402785934, 'AMO'),
             ('Tezcatlipoca', 1.709753263222791, 0.3647772103513082, 'AMO'),
             ('Midas', 1.775954494579457, 0.6503697243919138, 'APO'),
             ('Baboquivari', 2.646202507670927, 0.5295611095751231, 'AMO'),
             ('Anza', 2.26415089613359, 0.5371603112900858, 'AMO'),
             ('Aten', 0.9668828078092987, 0.1827831025175614, 'ATE'),
             ('Bacchus', 1.078135348117527, 0.3495569270441645, 'APO'),
             ('Ra-Shalom', 0.8320425524852308, 0.4364726062545577, 'ATE'),
             ('Adonis', 1.874315684524321, 0.763949321566, 'APO'),
             ('Tantalus', 1.289997492877751, 0.2990853014998932, 'APO'),
             ('Aristaeus', 1.599511990737142, 0.5030618532252225, 'APO'),
             ('Oljato', 2.172056090036035, 0.7125729402616418, 'APO'),
             ('Pele', 2.291471988746353, 0.5115484924883255, 'AMO'),
             ('Hephaistos', 2.159619960333728, 0.8374146846143349, 'APO'),
             ('Orthos', 2.404988778495748, 0.6569133796135244, 'APO'),
             ('Hathor', 0.8442121506103012, 0.4498204013480316, 'ATE'),
             ('Beltrovata', 2.104690977122337, 0.413731105995413, 'AMO'),
             ('Seneca', 2.516402574514213, 0.5708728441169761, 'AMO'),
             ('Krok', 2.152545170235639, 0.4478259793515817, 'AMO'),
             ('Eger', 1.404478323548423, 0.3542971360331806, 'APO'),
             ('Florence', 1.768227407864309, 0.4227761019048867, 'AMO'),
             ('Nefertiti', 1.574493139339916, 0.283902719273878, 'AMO'),
             ('Phaethon', 1.271195939723604, 0.8898716672181355, 'APO'),
             ('Ul', 2.102493486378346, 0.3951143067760007, 'AMO'),
             ('Seleucus', 2.033331705805067, 0.4559159977082651, 'AMO'),
             ('McAuliffe', 1.878722427225527, 0.3691521497610656, 'AMO'),
             ('Syrinx', 2.469752836845105, 0.7441934504192601, 'APO'),
             ('Orpheus', 1.209727780883745, 0.3229034563257626, 'APO'),
             ('Khufu', 0.989473784873371, 0.468479627898914, 'ATE'),
             ('Verenia', 2.093231870619781, 0.4865133359612604, 'AMO'),
             ('"Don Quixote"', 4.221712367193639, 0.7130894892477316, 'AMO'),
             ('Mera', 1.644476057737928, 0.3201425983025733, 'AMO')]

orbit_class = {'AMO':'Amor', 'APO':'Apollo', 'ATE':'Aten'}

In [83]:
# first we'll build the list using loops.
L = []
for data in Asteroids:
    name, a, e, t = data
    if abs(a - 1) < 0.2 and e < 0.5:
        L.append(name)
print L


['Cerberus', 'Aten', 'Bacchus', 'Ra-Shalom', 'Hathor', 'Khufu']

In [73]:
# now with a list comprehension...
L = [name for (name, a, e, t) in Asteroids
     if abs(a - 1) < 0.2 and e < 0.5]
print L


['Cerberus', 'Aten', 'Bacchus', 'Ra-Shalom']

Here is how we could create a dictionary from the list


In [81]:
D = dict([(name, (a, e, t)) for (name, a, e, t) in Asteroids])
print D['Eros']
print D['Amor']


(1.457837275957122, 0.2226677322184548)
(1.918872951565462, 0.4355437855993322)

Breakout #2: Sorting Asteroids

Using the above Asteroid list,

  • print the list sorted in alphabetical order by asteroid name (hint: how does sorting handle a list of tuples?)
  • print the list sorted by semi-major axis
  • print the list sorted by name, but replace the class code with the class name

The output should be formatted like this:

Asteroid name    a (AU)    e        class
-----------------------------------------
Eros             1.4578    0.2226   Amor
Albert           2.6292    0.5518   Amor
.
.
.

Bonus points if you can get the columns to line up nicely!