Python Data Structures

Data structure in computing

Data structures are how computer programs store information. Theses information can be processed, analyzed and visualized easily from the programme. Scientific data can be large and complex and may require data structures appropriate for scientific programming. In Astronomy, the fits file is one of the most widely used data-storing medium, it can store a lot of information including the coordinates, the precious time, a very large cataelog table, multi-dimension data cube, etc.. These data, when it is opened by the programme, shall be recognised and easily managed by the programme.

In Python, there are pre-defined advanced data structure depending on the kind of data you wish to store. You will have to choose data structures that best meet your requirements for the problem you are trying to solve. In this section, I will go through specifically examine three Python data structures: datetime, lists, tuples, sets, and dictionaries.

Built-In Types

Python's simple types are summarized in the following table:

Type	Example	Description
`int`	`x = 1`	integers (i.e., whole numbers)
`float`	`x = 1.0`	floating-point numbers (i.e., real numbers)
`complex`	`x = 1 + 2j`	Complex numbers (i.e., numbers with real and imaginary part)
`bool`	`x = True`	Boolean: True/False values
`str`	`x = 'abc'`	String: characters or text
`NoneType`	`x = None`	Special object indicating nulls

We'll take a quick look at each of these in turn.



In [1]:

    
a = 1 # integer
b = 1.1 #floating point numbers
c = True; d = False # Boolean (logical expression)
e = "Hello" # Strings

Arithmetic Operations

Python implements seven basic binary arithmetic operators, two of which can double as unary operators. They are summarized in the following table:

Operator	Name	Description
`a + b`	Addition	Sum of `a` and `b`
`a - b`	Subtraction	Difference of `a` and `b`
`a * b`	Multiplication	Product of `a` and `b`
`a / b`	True division	Quotient of `a` and `b`
`a // b`	Floor division	Quotient of `a` and `b`, removing fractional parts
`a % b`	Modulus	Integer remainder after division of `a` by `b`
`a ** b`	Exponentiation	`a` raised to the power of `b`
`-a`	Negation	The negative of `a`
`+a`	Unary plus	`a` unchanged (rarely used)

These operators can be used and combined in intuitive ways, using standard parentheses to group operations. For example:



In [2]:

    
# addition, subtraction, multiplication
(4 + 8) * (6.5 - 3)









    Out[2]:





42.0

Strings in Python 2 and 3

# Python 2

print type("Hello World!")
<type 'str'> 
# this is a byte string

print type(u"Hello World!")
<type 'unicode'>
# this is a Unicode string

# Python 3

print(type("Hello World!"))
<class 'str'>
# this is a Unicode string

print(type(b"Hello World!"))
<class 'bytes'>
# this is a byte string

Built-In Data Structures

Type Name	Example	Add Element	Get Element	Set Element	Description
`list`	`[1, 2, 3]`	x.append(1)	x[0]	x[0]=2	Ordered collection
`tuple`	`(1, 2, 3)`	no altering	x[0]	no altering	Immutable ordered collection
`dict`	`{'a':1, 'b':2, 'c':3}`	x['new_key'] = 4 or x.update({'new_key'=4}	x['a']	x['a']=2	Unordered (key,value) mapping
`set`	`{1, 2, 3}`	x.add(4)	no indexing	no indexing	Unordered collection of unique values

list

A Python list is a sequence of values (elements) that are usually the same kind of item. They are in order and mutable. Mutable means they can be changed after they are created, of course, this implies you can exchange the order of the elements inside it. This is a Python list of prime numbers smaller than 50:



In [3]:

    
x = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Definition

It is defined with parentheses : [xx,xx,xx].

Get Element

The elements are called using a square bracket with an index starting from zero : x[y], 0..N.

Slice (sub-array)

You can slice the array using colon, in this case a[start:end] means items start up to end-1.



In [4]:

    
print(x)
print(x[0])









    



[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
2

A single colon a[:] means a copy of the whole array.

a[start:] return tuple of items start through the rest of the array.

a[:end]return tuple of items from the beginning through end-1.



In [5]:

    
print(x[1:2])
print(x[:])
print(x[:2])
print(x[1:])









    



[3]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
[2, 3]
[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

more interestingly, they have negative index

a[-1] means last item in the array

a[-2:] means last two items in the array

a[:-2] means everything except the last two items



In [6]:

    
print(x[-1])
print(x[-2])
print(x[-2:])
print(x[:-2])









    



47
43
[43, 47]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41]

You may reversed a list with xxx[::-1].



In [7]:

    
print(x[::-1])









    



[47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]

Concatenate

You may add up two list or we say concatenate, and multiply to duplicate the items.



In [8]:

    
print(x + [0,1])
print([0,1] + x)
print([0,1] * 5)









    



[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 0, 1]
[0, 1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

Sorting

You may sort a list with sorted(x). Noted that it returns a new list.



In [9]:

    
print(x[::-1])
y = sorted(x[::-1])
print(y)









    



[47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Add element (append); Remove element (pop); Insert element (insert)

These functions are modified in-place, i.e. the original list will be changed



In [10]:

    
print(x)
x.append('A')
print(x)









    



[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 'A']



In [11]:

    
print(x)
x.insert(5,'B') # insert 'B' between x[4] and x[5], results in x[5] = 'B'
print(x)









    



[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 'A']
[2, 3, 5, 7, 11, 'B', 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 'A']



In [12]:

    
print(x); 
x.pop(5); # Removed the x[5] item and return it
print(x); 
x.pop(-1); # Removed the last item and return it
print(x)









    



[2, 3, 5, 7, 11, 'B', 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 'A']
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 'A']
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Tuple

A Python tuple is similar to a list. The elements are in order but fixed once they are created. In other words, they are immutable. The tuple can store differently type of elements.

Definition

It is defined with parentheses : (xx,xx,xx).

Get Element

The elements are called using a square bracket with an index starting from zero : x[y], 0..N.

Slice (sub-array)

You can slice the array using colon, in this case a[start:end] means items start up to end-1.



In [13]:

    
corr = (22.28552, 114.15769)
print(corr)









    



(22.28552, 114.15769)



In [14]:

    
corr[0] = 10









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-9c711ec5bb9f> in <module>
----> 1 corr[0] = 10

TypeError: 'tuple' object does not support item assignment

Dictionary

Dictionary is more flexible than list and its index is a string, it is defined with curly bracket:

data = {'k1' : y1 , 'k2' : y2 , 'k3' : y3 }

k1, k2, k3 are called keys while y1,y2 and y3 are elements.

Creating an empty dictionary

It is defined with a pair of curly bracket or the dict() fuction: data = {} or data = dict()

Creating a dictionary with initial values

It could be defined with a curly bracket with index:element pairs : data = {'k1' : y1 , 'k2' : y2 , 'k3' : y3 }.
It could also be defined with the dict() function : data = dict(k1=y1, k2=y2, k3=y3).
It could also be defined with tuples : data = {k: v for k, v in (('k1', y1),('k2',y2),('k3',y3))}.

Get Element

The elements are called using a square bracket with an index string : data[key].

Inserting/Updating a single value / multiple values

data['k1']=1 # Updates if 'k1' exists, else adds the element with index 'k1'
data.update({'k1':1})
data.update(dict(k1=1))
data.update(k1=1)
Multiple values : data.update({'k3':3,'k4':4}) # Updates 'k3' and adds 'k4'

Merged dictionary without modifying originals

data3 = {}
data3.update(data) # Modifies data3, not data
data3.update(data2) # Modifies data3, not data2

Delete an item

del data[key] # Removes specific element in a dictionary
data.pop(key) # Removes the key & returns the value
data.clear() # Clears entire dictionary

Check if a key is existed

key in data # Return a boolean

Iterate through pairs

for key in data: # Iterates just through the keys, ignoring the values
for key, value in d.items(): # Iterates through the pairs



In [15]:

    
# Creating an empty dictionary
location = {}
print(location)

{}



In [16]:

    
# Defined with a curly bracket
location = {
            'Berlin': (52.5170365, 13.3888599),
            'London': (51.5073219, -0.1276474),
            'Sydney': (-33.8548157, 151.2164539),
            'Tokyo': (34.2255804, 139.294774527387),
            'Paris': (48.8566101, 2.3514992),
            'Moscow': (46.7323875, -117.0001651)
           }
print(location)









    



{'Berlin': (52.5170365, 13.3888599), 'London': (51.5073219, -0.1276474), 'Sydney': (-33.8548157, 151.2164539), 'Tokyo': (34.2255804, 139.294774527387), 'Paris': (48.8566101, 2.3514992), 'Moscow': (46.7323875, -117.0001651)}



In [17]:

    
# Update
location.update({'Hong Kong': (22.2793278, 114.1628131)})
print(location)









    



{'Berlin': (52.5170365, 13.3888599), 'London': (51.5073219, -0.1276474), 'Sydney': (-33.8548157, 151.2164539), 'Tokyo': (34.2255804, 139.294774527387), 'Paris': (48.8566101, 2.3514992), 'Moscow': (46.7323875, -117.0001651), 'Hong Kong': (22.2793278, 114.1628131)}



In [18]:

    
# Call element
location['Tokyo']









    Out[18]:





(34.2255804, 139.294774527387)



In [19]:

    
# Delete element
del location['Hong Kong']
location









    Out[19]:





{'Berlin': (52.5170365, 13.3888599),
 'London': (51.5073219, -0.1276474),
 'Sydney': (-33.8548157, 151.2164539),
 'Tokyo': (34.2255804, 139.294774527387),
 'Paris': (48.8566101, 2.3514992),
 'Moscow': (46.7323875, -117.0001651)}



In [20]:

    
for key, value in location.items():
    print(key, value)









    



Berlin (52.5170365, 13.3888599)
London (51.5073219, -0.1276474)
Sydney (-33.8548157, 151.2164539)
Tokyo (34.2255804, 139.294774527387)
Paris (48.8566101, 2.3514992)
Moscow (46.7323875, -117.0001651)

Extra reading:



In [21]:

    
### More on slicing in list and tuple
start=2
end=5
step=2

print("Original:", x)
print("items start through end-1 :", x[start:end]) # items start through end-1
print("items start through the rest of the array :", x[start:])    # items start through the rest of the array
print("items from the beginning through end-1 :", x[:end])      # items from the beginning through end-1
print("whole array :", x[:])         # whole array
print("last item in the array :", x[-1])    # last item in the array
print("last two items in the array :", x[-2:])   # last two items in the array
print("everything except the last two items :", x[:-2])   # everything except the last two items

print("start through not past end, by step", x[start:end:step]) # start through no01-Python-Syntaxt past end, by step









    



Original: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
items start through end-1 : [5, 7, 11]
items start through the rest of the array : [5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
items from the beginning through end-1 : [2, 3, 5, 7, 11]
whole array : [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
last item in the array : 47
last two items in the array : [43, 47]
everything except the last two items : [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41]
start through not past end, by step [5, 11]

Bitwise Operations

In addition to the standard numerical operations, Python includes operators to perform bitwise logical operations on integers. These are much less commonly used than the standard arithmetic operations, but it's useful to know that they exist. The six bitwise operators are summarized in the following table:

Operator	Name	Description
`a & b`	Bitwise AND	Bits defined in both `a` and `b`
`a \| b`	Bitwise OR	Bits defined in `a` or `b` or both
`a ^ b`	Bitwise XOR	Bits defined in `a` or `b` but not both
`a << b`	Bit shift left	Shift bits of `a` left by `b` units
`a >> b`	Bit shift right	Shift bits of `a` right by `b` units
`~a`	Bitwise NOT	Bitwise negation of `a`

Summary

These operations shows Python are so easy to use compared to lower-level languages such as C. In C, we need to manually constructing a loop over the list and checking for equality of each value. In Python, you just type what you want to know, easy to type but hard to debug, just like English grammar.