This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599_2014/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2014_fall_ASTR599/).



In [1]:

    
%run talktools.py

Advanced Data Structures

There are four types of collections in Python

(known as "Sequence objects")

list : a mutable ordered array of data
tuple : an immutable ordered array of data
dict : an unordered mapping from keys to values
set : an unordered collection of unique elements

The values in any of these collections can be arbitrary Python objects, and mixing content types is OK.

Note that strings are also sequence objects.

Tuples

Tuples are denoted with parentheses



In [2]:

    
t = (12, -1)
print(type(t))









    



<class 'tuple'>



In [3]:

    
print(isinstance(t,tuple))
print(len(t))









    



True
2

Can mix types in a tuple



In [4]:

    
t = (12, "monty", True, -1.23e6)
print(t[1])









    



monty

Indexing works the same way as for strings:



In [5]:

    
print(t[-1])



In [6]:

    
t[-2:]  # get the last two elements, return as a tuple









    Out[6]:





(True, -1230000.0)



In [7]:

    
x = (True) ; print(type(x))
x = (True,) ; print(type(x))









    



<class 'bool'>
<class 'tuple'>



In [8]:

    
x = ()
type(x), len(x)









    Out[8]:





(tuple, 0)



In [9]:

    
x = (,)









    



  File "<ipython-input-9-bd05d59e2976>", line 1
    x = (,)
         ^
SyntaxError: invalid syntax

single-element tuples look like (element,)

tuples cannot be modified. but you can create new one with concatenation



In [10]:

    
t[2] = False









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-9365ccccf007> in <module>()
----> 1 t[2] = False

TypeError: 'tuple' object does not support item assignment



In [11]:

    
t[0:2], False, t[3:]









    Out[11]:





((12, 'monty'), False, (-1230000.0,))



In [12]:

    
## the above is
## not what we wanted... need to concatenate
t[0:2] + False + t[3:]









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-aaeb0198f3bf> in <module>()
      1 ## the above is
      2 ## not what we wanted... need to concatenate
----> 3 t[0:2] + False + t[3:]

TypeError: can only concatenate tuple (not "bool") to tuple



In [13]:

    
y = t[0:2] + (False,) + t[3:]
y









    Out[13]:





(12, 'monty', False, -1230000.0)



In [14]:

    
t * 2









    Out[14]:





(12, 'monty', True, -1230000.0, 12, 'monty', True, -1230000.0)

Tuples are most commonly used in functions which return multiple arguments.

Lists

Lists are denoted with square brackets



In [15]:

    
v = [1,2,3]
print(len(v))
print(type(v))









    



3
<class 'list'>



In [16]:

    
v[0:2]









    Out[16]:





[1, 2]



In [17]:

    
v = ["eggs", "spam", -1, ("monty","python"), [-1.2,-3.5]]
len(v)









    Out[17]:





5



In [18]:

    
v[0] ="green egg"
v[1] += ",love it."
v[-1]









    Out[18]:





[-1.2, -3.5]



In [19]:

    
v[-1][1] = None
print(v)









    



['green egg', 'spam,love it.', -1, ('monty', 'python'), [-1.2, None]]



In [20]:

    
v = v[2:]
print(v)









    



[-1, ('monty', 'python'), [-1.2, None]]



In [21]:

    
# let's make a proto-array out of nested lists
vv = [ [1,2], [3,4] ]



In [22]:

    
len(vv)









    Out[22]:





2



In [23]:

    
determinant = vv[0][0] * vv[1][1] - vv[0][1] * vv[1][0]
determinant









    Out[23]:





-2

the main point here: lists are mutable

Lists can be Extended and Appended



In [24]:

    
v = [1,2,3]
v.append(4)
v.append([-5])
v









    Out[24]:





[1, 2, 3, 4, [-5]]

Note: lists can be considered objects. Objects are collections of data and associated methods. In the case of a list, append is a method: it is a function associated with the object.



In [25]:

    
v = v[:4]
w = ['elderberries', 'eggs']
v + w









    Out[25]:





[1, 2, 3, 4, 'elderberries', 'eggs']



In [26]:

    
v









    Out[26]:





[1, 2, 3, 4]



In [27]:

    
v.extend(w)
v









    Out[27]:





[1, 2, 3, 4, 'elderberries', 'eggs']



In [28]:

    
v.pop()









    Out[28]:





'eggs'



In [29]:

    
v









    Out[29]:





[1, 2, 3, 4, 'elderberries']



In [30]:

    
v.pop(0) ## pop the first element









    Out[30]:





1



In [31]:

    
v









    Out[31]:





[2, 3, 4, 'elderberries']

Useful list methods:

.append(): adds a new element
.extend(): concatenates a list/element
.pop(): remove an element

Lists can be searched, sorted, & counted



In [32]:

    
v = [1, 3, 2, 3, 4]
v.sort()
v









    Out[32]:





[1, 2, 3, 3, 4]

reverse is a keyword of the .sort() method



In [33]:

    
v.sort(reverse=True)
v









    Out[33]:





[4, 3, 3, 2, 1]

.sort() changes the the list in place



In [34]:

    
v.index(4)   ## lookup the index of the entry 4









    Out[34]:





0



In [35]:

    
v.index(3)









    Out[35]:





1



In [36]:

    
v.count(3)









    Out[36]:





2



In [37]:

    
v.insert(0, "it's full of stars")
v









    Out[37]:





["it's full of stars", 4, 3, 3, 2, 1]



In [38]:

    
v.remove(1)
v









    Out[38]:





["it's full of stars", 4, 3, 3, 2]

Using IPython to learn more

IPython is your new best friend: it's tab-completion allows you to explore all methods available to an object. (This only works in IPython)

Type

v.

and then the tab key to see all the available methods:



In [ ]:

    
v.

Once you find a method, type (for example)

v.index?

and press shift-enter: you'll see the documentation of the method



In [40]:

    
v.index?

This is probably the most important thing you'll learn today

Iterating over Lists



In [41]:

    
a = ['cat', 'window', 'defenestrate']
for x in a:
    print(x, len(x))









    



cat 3
window 6
defenestrate 12



In [42]:

    
for i,x in enumerate(a):
    print(i, x, len(x))









    



0 cat 3
1 window 6
2 defenestrate 12



In [43]:

    
for x in a:
    print(x, end=' ')









    



cat window defenestrate

The syntax for iteration is...

for variable_name in iterable:
   # do something with variable_name

The `range()` function

The range() function creates a list of integers

(actually an iterator, but think of it as a list)



In [44]:

    
x = range(4)
x









    Out[44]:





range(0, 4)



In [45]:

    
total = 0
for val in range(4):
    total += val
    print("By adding " + str(val) + \
          " the total is now " + str(total))









    



By adding 0 the total is now 0
By adding 1 the total is now 1
By adding 2 the total is now 3
By adding 3 the total is now 6

range([start,] stop[, step]) → list of integers



In [46]:

    
total = 0
for val in range(1, 10, 2):
    total += val
    print("By adding " + str(val) + \
          " the total is now " + str(total))









    



By adding 1 the total is now 1
By adding 3 the total is now 4
By adding 5 the total is now 9
By adding 7 the total is now 16
By adding 9 the total is now 25

Quick Exercise:

Write a loop over the words in this list and print the words longer than three characters in length:



In [47]:

    
L = ["Oh", "Say", "does", "that", "star",
     "spangled", "banner", "yet", "wave"]

Sets

Sets can be thought of as unordered lists of unique items

Sets are denoted with a curly braces



In [48]:

    
{1,2,3,"bingo"}









    Out[48]:





{1, 2, 3, 'bingo'}



In [49]:

    
type({1,2,3,"bingo"})









    Out[49]:





set



In [50]:

    
type({})









    Out[50]:





dict



In [51]:

    
type(set())









    Out[51]:





set



In [52]:

    
set("spamIam")









    Out[52]:





{'I', 'a', 'm', 'p', 's'}

sets have unique elements. They can be compared, differenced, unionized, etc.



In [53]:

    
a = set("sp")
b = set("am")
print(a, b)









    



{'p', 's'} {'a', 'm'}



In [54]:

    
c = set(["a","m"])
c == b









    Out[54]:





True



In [55]:

    
"p" in a









    Out[55]:





True



In [56]:

    
a | b









    Out[56]:





{'a', 'm', 'p', 's'}

Dictionaries

Dictionaries are one-to-one mappings of objects.

We'll show four ways to make a Dictionary



In [57]:

    
# number 1... curly braces & colons
d = {"favorite cat": None,
     "favorite spam": "all"}
d









    Out[57]:





{'favorite spam': 'all', 'favorite cat': None}



In [58]:

    
# number 2
d = dict(one = 1, two=2, cat='dog')
d









    Out[58]:





{'cat': 'dog', 'two': 2, 'one': 1}



In [59]:

    
# number 3 ... just start filling in items/keys
d = {}  # empty dictionary
d['cat'] = 'dog'
d['one'] = 1
d['two'] = 2
d









    Out[59]:





{'cat': 'dog', 'two': 2, 'one': 1}



In [60]:

    
# number 4... start with a list of tuples
mylist = [("cat","dog"), ("one",1), ("two",2)]
dict(mylist)









    Out[60]:





{'cat': 'dog', 'two': 2, 'one': 1}



In [61]:

    
dict(mylist) == d









    Out[61]:





True

Dictionaries can be complicated (in a good way)

Note that there is no guaranteed order in a dictionary!



In [62]:

    
d = {"favorite cat": None, "favorite spam": "all"}



In [63]:

    
d[0]  # this breaks!  Dictionaries have no order









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-63-d1fc1eca4ebb> in <module>()
----> 1 d[0]  # this breaks!  Dictionaries have no order

KeyError: 0



In [64]:

    
d["favorite spam"]









    Out[64]:





'all'



In [65]:

    
d[0] = "this is a zero"
d









    Out[65]:





{0: 'this is a zero', 'favorite spam': 'all', 'favorite cat': None}

Dictionaries can contain dictionaries!



In [66]:

    
d = {'favorites': {'cat': None, 'spam': 'all'},\
     'least favorite': {'cat': 'all', 'spam': None}}
d['least favorite']['cat']









    Out[66]:





'all'

remember: the backslash ('\') allows you to across break lines. Not technically needed when defining a dictionary or list

Dictionaries are used everywhere within Python...



In [67]:

    
# globals() and locals() store all global and local variables
globals().keys()









    Out[67]:





dict_keys(['_i62', 'b', 'mylist', 'd', '_i56', 'L', '_i10', '__builtins__', '_i47', '_i11', '_i25', '_i', '_ih', '_ii', '__doc__', '_25', 'v', '_i22', 'x', '_57', '_18', '_i13', '_i29', '_i28', '_50', '_51', 'print_function', '_29', '_28', '_i18', '_i19', '__', '_i14', '_i15', '_i16', '_i17', '_27', '_26', '_i12', '_24', 'vv', '___', '_58', 'i', '_i64', '_8', '_i58', '_i8', '_i9', 'display', 'determinant', '_i2', '_i3', '_i1', '_i6', '_i7', '_i4', '_i5', '_i50', '_i51', '_i52', '_i53', '_i54', '_i55', 't', '_i57', '_i26', '_i59', '_38', '_55', '_i24', '_30', '_31', '_32', '_33', '_13', '_i27', '_36', '_37', '_dh', '_11', '_i21', '_i66', '_16', '_i20', 'get_ipython', 'a', 'c', '_14', '__builtin__', 'HTML', '_34', 'Out', '_i44', '_35', '_i43', '_i42', '_i41', '_i40', '_oh', 'total', '_i45', 'w', 'y', '_i49', '_i48', '_54', '_i38', '_i39', '_i61', '_i32', '_i33', '_i30', '_i31', '_i36', '_i37', '__name__', '__nonzero__', '_i46', '_i65', '_i23', '_i67', '_44', '_52', 'val', '_60', '_', 'quit', '_i60', '_49', '_48', 'In', '_6', '_i34', '_59', '_64', '_iii', '_i63', '_17', '_66', '_65', 'exit', '_sh', '_23', '__warningregistry__', '_i35', '_56', '_22', '_61'])

List Comprehensions

A pythonic way of creating lists on-the-fly

Example: imagine you want a list of all numbers from 0 to 100 which are divisible by 7 or 11



In [68]:

    
L = []
for num in range(100):
    if (num % 7 == 0) or (num % 11 == 0):
        L.append(num)
print(L)









    



[0, 7, 11, 14, 21, 22, 28, 33, 35, 42, 44, 49, 55, 56, 63, 66, 70, 77, 84, 88, 91, 98, 99]

We can also do this with a list comprehension:



In [69]:

    
L = [num for num in range(100)\
     if (num % 7 == 0) or (num % 11 == 0)]
print(L)









    



[0, 7, 11, 14, 21, 22, 28, 33, 35, 42, 44, 49, 55, 56, 63, 66, 70, 77, 84, 88, 91, 98, 99]



In [70]:

    
# Can also operate on each element:
L = [2 * num for num in range(100)\
     if (num % 7 == 0) or (num % 11 == 0)]
print(L)









    



[0, 14, 22, 28, 42, 44, 56, 66, 70, 84, 88, 98, 110, 112, 126, 132, 140, 154, 168, 176, 182, 196, 198]

Example: Below is a list of information on 50 of the largest near-earth asteroids. Given this list of asteroid information, let's find all asteroids with semi-major axis within 0.2AU of earth, and with eccentricities less than 0.5



In [71]:

    
# Each element is (name, semi-major axis (AU), eccentricity, orbit class)
# source: http://ssd.jpl.nasa.gov/sbdb_query.cgi

Asteroids = [('Eros', 1.457916888347732, 0.2226769029627053, 'AMO'),
             ('Albert', 2.629584157344544, 0.551788195302116, 'AMO'),
             ('Alinda', 2.477642943521562, 0.5675993715753302, 'AMO'),
             ('Ganymed', 2.662242764279804, 0.5339300994578989, 'AMO'),
             ('Amor', 1.918987277620309, 0.4354863345648127, 'AMO'),
             ('Icarus', 1.077941311539208, 0.826950446001521, 'APO'),
             ('Betulia', 2.196489260519891, 0.4876246891992282, 'AMO'),
             ('Geographos', 1.245477192797457, 0.3355407124897842, 'APO'),
             ('Ivar', 1.862724540418448, 0.3968541470639658, 'AMO'),
             ('Toro', 1.367247622946547, 0.4358829575017499, 'APO'),
             ('Apollo', 1.470694262588244, 0.5598306817483757, 'APO'),
             ('Antinous', 2.258479598510079, 0.6070051516585434, 'APO'),
             ('Daedalus', 1.460912865705988, 0.6144629118218898, 'APO'),
             ('Cerberus', 1.079965807367047, 0.4668134997419173, 'APO'),
             ('Sisyphus', 1.893726635847921, 0.5383319204425762, 'APO'),
             ('Quetzalcoatl', 2.544270656955212, 0.5704591861565643, 'AMO'),
             ('Boreas', 2.271958775354725, 0.4499332278634067, 'AMO'),
             ('Cuyo', 2.150453953345012, 0.5041719257675564, 'AMO'),
             ('Anteros', 1.430262719980132, 0.2558054402785934, 'AMO'),
             ('Tezcatlipoca', 1.709753263222791, 0.3647772103513082, 'AMO'),
             ('Midas', 1.775954494579457, 0.6503697243919138, 'APO'),
             ('Baboquivari', 2.646202507670927, 0.5295611095751231, 'AMO'),
             ('Anza', 2.26415089613359, 0.5371603112900858, 'AMO'),
             ('Aten', 0.9668828078092987, 0.1827831025175614, 'ATE'),
             ('Bacchus', 1.078135348117527, 0.3495569270441645, 'APO'),
             ('Ra-Shalom', 0.8320425524852308, 0.4364726062545577, 'ATE'),
             ('Adonis', 1.874315684524321, 0.763949321566, 'APO'),
             ('Tantalus', 1.289997492877751, 0.2990853014998932, 'APO'),
             ('Aristaeus', 1.599511990737142, 0.5030618532252225, 'APO'),
             ('Oljato', 2.172056090036035, 0.7125729402616418, 'APO'),
             ('Pele', 2.291471988746353, 0.5115484924883255, 'AMO'),
             ('Hephaistos', 2.159619960333728, 0.8374146846143349, 'APO'),
             ('Orthos', 2.404988778495748, 0.6569133796135244, 'APO'),
             ('Hathor', 0.8442121506103012, 0.4498204013480316, 'ATE'),
             ('Beltrovata', 2.104690977122337, 0.413731105995413, 'AMO'),
             ('Seneca', 2.516402574514213, 0.5708728441169761, 'AMO'),
             ('Krok', 2.152545170235639, 0.4478259793515817, 'AMO'),
             ('Eger', 1.404478323548423, 0.3542971360331806, 'APO'),
             ('Florence', 1.768227407864309, 0.4227761019048867, 'AMO'),
             ('Nefertiti', 1.574493139339916, 0.283902719273878, 'AMO'),
             ('Phaethon', 1.271195939723604, 0.8898716672181355, 'APO'),
             ('Ul', 2.102493486378346, 0.3951143067760007, 'AMO'),
             ('Seleucus', 2.033331705805067, 0.4559159977082651, 'AMO'),
             ('McAuliffe', 1.878722427225527, 0.3691521497610656, 'AMO'),
             ('Syrinx', 2.469752836845105, 0.7441934504192601, 'APO'),
             ('Orpheus', 1.209727780883745, 0.3229034563257626, 'APO'),
             ('Khufu', 0.989473784873371, 0.468479627898914, 'ATE'),
             ('Verenia', 2.093231870619781, 0.4865133359612604, 'AMO'),
             ('"Don Quixote"', 4.221712367193639, 0.7130894892477316, 'AMO'),
             ('Mera', 1.644476057737928, 0.3201425983025733, 'AMO')]

orbit_class = {'AMO':'Amor', 'APO':'Apollo', 'ATE':'Aten'}



In [72]:

    
# first we'll build the list using loops.
L = []
for data in Asteroids:
    name, a, e, t = data
    if abs(a - 1) < 0.2 and e < 0.5:
        L.append(name)
print(L)









    



['Cerberus', 'Aten', 'Bacchus', 'Ra-Shalom', 'Hathor', 'Khufu']



In [73]:

    
# now with a list comprehension...
L = [name for (name, a, e, t) in Asteroids
     if abs(a - 1) < 0.2 and e < 0.5]
print(L)









    



['Cerberus', 'Aten', 'Bacchus', 'Ra-Shalom', 'Hathor', 'Khufu']

Here is how we could create a dictionary from the list



In [74]:

    
D = dict([(name, (a, e, t)) for (name, a, e, t) in Asteroids])
print(D['Eros'])
print(D['Amor'])









    



(1.457916888347732, 0.2226769029627053, 'AMO')
(1.918987277620309, 0.4354863345648127, 'AMO')

Breakout #2: Sorting Asteroids

Using the above Asteroid list,

print the list sorted in alphabetical order by asteroid name (hint: how does sorting handle a list of tuples?)
print the list sorted by semi-major axis
print the list sorted by name, but replace the class code with the class name

The output should be formatted like this:

Asteroid name    a (AU)    e        class
-----------------------------------------
Eros             1.4578    0.2226   Amor
Albert           2.6292    0.5518   Amor
.
.
.

Bonus points if you can get the columns to line up nicely!