Python Data Structures

Data structure in computing

Data structures are how computer programs store information. Theses information can be processed, analyzed and visualized easily from the programme. Scientific data can be large and complex and may require data structures appropriate for scientific programming. In Astronomy, the fits file is one of the most widely used data-storing medium, it can store a lot of information including the coordinates, the precious time, a very large cataelog table, multi-dimension data cube, etc.. These data, when it is opened by the programme, shall be recognised and easily managed by the programme.

In Python, there are pre-defined advanced data structure depending on the kind of data you wish to store. You will have to choose data structures that best meet your requirements for the problem you are trying to solve. In this section, I will go through specifically examine three Python data structures: datetime, lists, tuples, sets, and dictionaries.

lists

A Python list is a sequence of values (elements) that are usually the same kind of item. They are in order and mutable. Mutable means they can be changed after they are created, of course, this implies you can exchange the order of the elements inside it. This is a Python list of prime numbers smaller than 100:


In [1]:
x = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Definition

It is defined with parentheses : [xx,xx,xx].

Get Element

The elements are called using a square bracket with an index starting from zero : x[y], 0..N.

Slice (sub-array)

You can slice the array using colon, in this case a[start:end] means items start up to end-1.


In [2]:
print x
print x[0]


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
2

A single colon a[:] means a copy of the whole array.

a[start:] return tuple of items start through the rest of the array.

a[:end]return tuple of items from the beginning through end-1.


In [3]:
print x[1:2]
print x[:]
print x[:2]
print x[1:]


[3]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[2, 3]
[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

more interestingly, they have negative index

a[-1] means last item in the array

a[-2:] means last two items in the array

a[:-2] means everything except the last two items


In [4]:
print x[-1]
print x[-2]
print x[-2:]
print x[:-2]


97
89
[89, 97]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83]

You may reversed a list with xxx[::-1].


In [5]:
print x[::-1]


[97, 89, 83, 79, 73, 71, 67, 61, 59, 53, 47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]

Concatenate

You may add up two list or we say concatenate, and multiply to duplicate the items.


In [6]:
print x + [0,1]
print [0,1] + x
print [0,1] * 5


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 0, 1]
[0, 1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

Sorting

You may sort a list with sorted(x). Noted that it returns a new list.


In [7]:
y = [97, 89, 83, 79, 73, 71, 67, 61, 59, 53, 47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]
print y
z = sorted(y)
print y
print z


[97, 89, 83, 79, 73, 71, 67, 61, 59, 53, 47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]
[97, 89, 83, 79, 73, 71, 67, 61, 59, 53, 47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Add element (append); Remove element (pop); Insert element (insert)

These functions are modified in-place, i.e. the original list will be changed


In [8]:
print x
x.append('A')
print x


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 'A']

In [9]:
print x
x.insert(5,'B') # insert 'B' between x[4] and x[5], results in x[5] = 'B'
print x


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 'A']
[2, 3, 5, 7, 11, 'B', 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 'A']

In [10]:
print x; 
x.pop(5); # Removed the x[5] item and return it
print x; 
x.pop(-1); # Removed the last item and return it
print x


[2, 3, 5, 7, 11, 'B', 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 'A']
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 'A']
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Tuple

A Python tuple is similar to a list. The elements are in order but fixed once they are created. In other words, they are immutable. The tuple can store differently type of elements.

Definition

It is defined with parentheses : (xx,xx,xx).

Get Element

The elements are called using a square bracket with an index starting from zero : x[y], 0..N.

Slice (sub-array)

You can slice the array using colon, in this case a[start:end] means items start up to end-1.


In [11]:
corr = (22.28552, 114.15769)
print corr


(22.28552, 114.15769)

In [12]:
corr[0] = 10



TypeErrorTraceback (most recent call last)
<ipython-input-12-7dbbdf2a13d7> in <module>()
----> 1 corr[0] = 10

TypeError: 'tuple' object does not support item assignment

Useful Functions for Lists and Tuples


In [13]:
# Length of the list/tuple
print len(x)
# Retrun the minimum and maximum in the list/tuple
print min(x), max(x)


25
2 97

In [14]:
# Multiple assignment
lat,lon = corr
print lat, lon


22.28552 114.15769

In [15]:
# String formating with Tuple unpacking with `*`
print 'lat {0}, lon {1}'.format(*corr)


lat 22.28552, lon 114.15769

In [16]:
# tuple to list, list to tuple
print list(corr)
print tuple(x)


[22.28552, 114.15769]
(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)

More on slicing in list and tuple


In [17]:
### More on slicing in list and tuple
start=2
end=5
step=2

print "Original:",x
print x[start:end] # items start through end-1
print x[start:]    # items start through the rest of the array
print x[:end]      # items from the beginning through end-1
print x[:]         # a copy of the whole array
print x[-1]    # last item in the array
print x[-2:]   # last two items in the array
print x[:-2]   # everything except the last two items

print x[start:end:step] # start through not past end, by step


Original: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[5, 7, 11]
[5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[2, 3, 5, 7, 11]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
97
[89, 97]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83]
[5, 11]

Set

Dictionary

Dictionary is more flexible than list and its index is a string, it is defined with curly bracket:

data = {'k1' : y1 , 'k2' : y2 , 'k3' : y3 }

k1,k2,k3 are called keys while y1,y2 and y3 are elements.

Creating an empty dictionary

It is defined with a pair of curly bracket or the dict() fuction: data = {} or data = dict()

Creating a dictionary with initial values

  1. It could be defined with a curly bracket with index:element pairs : data = {'k1' : y1 , 'k2' : y2 , 'k3' : y3 }.
  2. It could also be defined with the dict() function : data = dict(k1=y1, k2=y2, k3=y3).
  3. It could also be defined with tuples : data = {k: v for k, v in (('k1', y1),('k2',y2),('k3',y3))}.

Get Element

The elements are called using a square bracket with an index string : data[key].

Inserting/Updating a single value / multiple values

  1. data['k1']=1 # Updates if 'k1' exists, else adds the element with index 'k1'

  2. data.update({'k1':1})

  3. data.update(dict(k1=1))

  4. data.update(k1=1)

  5. Multiple values : data.update({'k3':3,'k4':4}) # Updates 'k3' and adds 'k4'

Merged dictionary without modifying originals

  1. data3 = {}
  2. data3.update(data) # Modifies data3, not data
  3. data3.update(data2) # Modifies data3, not data2

Delete an item

  1. del data[key] # Removes specific element in a dictionary
  2. data.pop(key) # Removes the key & returns the value
  3. data.clear() # Clears entire dictionary

Check if a key is existed

  1. key in data # Return a boolean

Iterate through pairs

  1. for key in data: # Iterates just through the keys, ignoring the values
  2. for key, value in d.items(): # Iterates through the pairs

In [18]:
# Creating an empty dictionary
location = {}
print location


{}

In [19]:
# Defined with a curly bracket
location = {
            'Berlin': (52.5170365, 13.3888599),
            'London': (51.5073219, -0.1276474),
            'Sydney': (-33.8548157, 151.2164539),
            'Tokyo': (34.2255804, 139.294774527387),
            'Paris': (48.8566101, 2.3514992),
            'Moscow': (46.7323875, -117.0001651)
           }
print location


{'Tokyo': (34.2255804, 139.294774527387), 'Paris': (48.8566101, 2.3514992), 'Moscow': (46.7323875, -117.0001651), 'Berlin': (52.5170365, 13.3888599), 'London': (51.5073219, -0.1276474), 'Sydney': (-33.8548157, 151.2164539)}

In [20]:
# Update
location.update({'Hong Kong': (22.2793278, 114.1628131)})
print location


{'Tokyo': (34.2255804, 139.294774527387), 'Paris': (48.8566101, 2.3514992), 'Moscow': (46.7323875, -117.0001651), 'Berlin': (52.5170365, 13.3888599), 'London': (51.5073219, -0.1276474), 'Sydney': (-33.8548157, 151.2164539), 'Hong Kong': (22.2793278, 114.1628131)}

In [21]:
# Call element
print location['Tokyo']


(34.2255804, 139.294774527387)

In [22]:
# Delete element
del location['Hong Kong']
print location


{'Tokyo': (34.2255804, 139.294774527387), 'Paris': (48.8566101, 2.3514992), 'Moscow': (46.7323875, -117.0001651), 'Berlin': (52.5170365, 13.3888599), 'London': (51.5073219, -0.1276474), 'Sydney': (-33.8548157, 151.2164539)}

In [23]:
for key, value in location.items():
    print value


(34.2255804, 139.294774527387)
(48.8566101, 2.3514992)
(46.7323875, -117.0001651)
(52.5170365, 13.3888599)
(51.5073219, -0.1276474)
(-33.8548157, 151.2164539)

Numpy

Numpy is a numerical package used extensively in python coding. You can call the install the numpy package by

pip install numpy

When you import a module, you can choose to bound an alias to the package. In python communities, we usually import the nupy module like this:


In [24]:
import numpy as np

When you import a module via

import numpy

the numpy package is bound to the local variable numpy. The new syntax

import numpy as np

allows you to bind the import to the local variable name of your choice (usually to avoid name collisions, shorten verbose module names, or standardize access to modules with compatible APIs). The whole command is equivalent to:

import numpy
np = numpy
del numpy

numpy array

A numpy array is a grid of values, all of the same type. The number of dimensions give the rank of the array. To initilze a 1D array, we will do:


In [25]:
a = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97])

The shape of an array is a tuple of integers giving the size of the array along each dimension.


In [26]:
print a.shape
print a


(25,)
[ 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

To initilze a 1D array, we can do:


In [27]:
b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [28]:
print b.shape
print b


(3, 3)
[[1 2 3]
 [4 5 6]
 [7 8 9]]

To call or change the element in the array, we can apply similar operation as a list:


In [29]:
print a[0], a[3], a[5]


2 7 13

In [30]:
print b[0,0],b[1,1],b[2,2]


1 5 9

In [31]:
a[0] = 5
print a[0]
a[0] = 2


5

In [32]:
print b
b[0,0] = 2
print b


[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[2 2 3]
 [4 5 6]
 [7 8 9]]

Numpy also provides some alternative functions to initlize arrays:


In [33]:
# Create an array of all zeros
a = np.zeros((2,2))   
print(a)


[[ 0.  0.]
 [ 0.  0.]]

In [34]:
# Create an array of all ones
b = np.ones((5,5))   
print(b)


[[ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]]

In [35]:
# Create a constant array
c = np.full((3,3), 7)  
print(c)


[[7 7 7]
 [7 7 7]
 [7 7 7]]

In [36]:
# Create a 3x3 identity matrix
d = np.eye(3)
print(d)


[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

In [37]:
# Create an array filled with random values from 0 to 1
e = np.random.random((6,6))
print(e)


[[ 0.56180486  0.11009023  0.65569103  0.36894147  0.50407966  0.92312902]
 [ 0.39547475  0.2901209   0.21039092  0.21562841  0.09260257  0.9610061 ]
 [ 0.89873423  0.33189449  0.08212571  0.93276282  0.97222272  0.90359426]
 [ 0.09031404  0.97150668  0.88675735  0.91413619  0.62444539  0.58887615]
 [ 0.4567415   0.97222898  0.52123559  0.38602818  0.59476854  0.78949176]
 [ 0.6481244   0.44626018  0.12230136  0.84572818  0.13646841  0.20346184]]

In [47]:
# arange function in numpy provides convenient way to produce an array interpolate between the numbers, the input
# are start, stop, step, run %pdoc np.arange to check
print np.arange(-10,10,1)


[-10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
   8   9]

In [50]:
# np.pi = pi, the resultant araay of this line outputs an interpolation from -pi/2 to pi/2, with 0.01 interval
print np.arange(-np.pi/2,np.pi/2,0.1)


[-1.57079633 -1.47079633 -1.37079633 -1.27079633 -1.17079633 -1.07079633
 -0.97079633 -0.87079633 -0.77079633 -0.67079633 -0.57079633 -0.47079633
 -0.37079633 -0.27079633 -0.17079633 -0.07079633  0.02920367  0.12920367
  0.22920367  0.32920367  0.42920367  0.52920367  0.62920367  0.72920367
  0.82920367  0.92920367  1.02920367  1.12920367  1.22920367  1.32920367
  1.42920367  1.52920367]

Array indexing

Slicing


In [39]:
e = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print e
# Each dimensions slice similar to a list, here it means 1st dimension select all, 
# 2nd dimension select from one till end.
print e[:,1:]
# Here it means 1st dimension and 2nd dimension select from start till 2, 
# i.e. the upper left part of the array.
print e[:2,:2]


[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[2 3]
 [5 6]
 [8 9]]
[[1 2]
 [4 5]]

Boolean array indexing


In [40]:
# An operation like numpy array > than a value return a boolean array
e = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print e > 5


[[False False False]
 [False False  True]
 [ True  True  True]]

In [41]:
# If we put the boolean array into the same array, itwill select all element that satisfy the conditions
e = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print e[e > 5]


[6 7 8 9]

Array Operation


In [42]:
# numpy sum can sum up all value in an array
e = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print e
print np.sum(e)


[[1 2 3]
 [4 5 6]
 [7 8 9]]
45

In [43]:
# Or along a dimension, here the operation sum up along the column (2nd dimension)
print np.sum(e,1)


[ 6 15 24]

In [44]:
# Or along a dimension, here the operation sum up along the row (1st dimension)
print np.sum(e,0)


[12 15 18]

In [45]:
# The .T returns the transpose of an array
print e.T


[[1 4 7]
 [2 5 8]
 [3 6 9]]

In [ ]:


In [ ]: