DS Data manipulation, analysis and visualisation in Python
December, 2019© 2016, Joris Van den Bossche and Stijn Van Hoey (mailto:jorisvandenbossche@gmail.com, mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons
This notebook is largely based on material of the Python Scientific Lecture Notes (https://scipy-lectures.github.io/), adapted with some exercises.
Importing packages is always the first thing you do in python, since it offers the functionalities to work with.
Different options are available:
importing all functionalities as such
importing a specific function or subset of the package
importing all definitions and actions of the package (sometimes better than option 1)
Very good way to keep a good insight in where you use what package
import all functionalities as such
In [1]:
# Two general packages
import os
import sys
Python supports the following numerical, scalar types:
In [2]:
an_integer = 3
print(type(an_integer))
In [3]:
an_integer
Out[3]:
In [4]:
# type casting: converting the integer to a float type
float(an_integer)
Out[4]:
In [5]:
a_float = 0.2
type(a_float)
Out[5]:
In [6]:
a_complex = 1.5 + 0.5j
# get the real or imaginary part of the complex number by using the functions
# real and imag on the variable
print(type(a_complex), a_complex.real, a_complex.imag)
In [7]:
a_boolean = (3 > 4)
a_boolean
Out[7]:
A Python shell can therefore replace your pocket calculator, with the basic arithmetic operations addition, substraction, division ... are natively implemented +, -, *, /, % (modulo) natively implemented
| operation | python implementation |
|---|---|
| addition | + |
| substraction | - |
| multiplication | * |
| division | / |
| modulo | % |
| exponentiation | ** |
In [8]:
print (7 * 3.)
print (2**10)
print (8 % 3)
Attention !
In [9]:
print(3/2)
print(3/2.)
print(3.//2.) #integer division
A list is an ordered collection of objects, that may have different types. The list container supports slicing, appending, sorting ...
Indexing starts at 0 (as in C, C++ or Java), not at 1 (as in Fortran or Matlab)!
In [10]:
a_list = [2.,'aa', 0.2]
a_list
Out[10]:
In [11]:
# accessing individual object in the list
a_list[1]
Out[11]:
In [12]:
# negative indices are used to count from the back
a_list[-1]
Out[12]:
Slicing: obtaining sublists of regularly-spaced elements
In [13]:
another_list = ['first', 'second', 'third', 'fourth', 'fifth']
print(another_list[3:])
print(another_list[:2])
print(another_list[::2])
Lists are mutable objects and can be modified
In [14]:
another_list[3] = 'newFourth'
print(another_list)
another_list[1:3] = ['newSecond', 'newThird']
print(another_list)
Warning, with views equal to each other, they point to the same point in memory. Changing one of them is also changing the other!!
In [15]:
a = ['a', 'b']
b = a
b[0] = 1
print(a)
List methods:
You can always list the available methods in the namespace by using the dir()-command:
In [16]:
#dir(list)
In [17]:
a_third_list = ['red', 'blue', 'green', 'black', 'white']
In [18]:
# Appending
a_third_list.append('pink')
a_third_list
Out[18]:
In [19]:
# Removes and returns the last element
a_third_list.pop()
a_third_list
Out[19]:
In [20]:
# Extends the list in-place
a_third_list.extend(['pink', 'purple'])
a_third_list
Out[20]:
In [21]:
# Reverse the list
a_third_list.reverse()
a_third_list
Out[21]:
In [22]:
# Remove the first occurence of an element
a_third_list.remove('white')
a_third_list
Out[22]:
In [23]:
# Sort list
a_third_list.sort()
a_third_list
Out[23]:
In [24]:
a_third_list.count?
In [25]:
a_third_list.index?
In [26]:
a_third_list = ['red', 'blue', 'green', 'black', 'white']
In [27]:
# remove the last two elements
a_third_list = a_third_list[:-2]
a_third_list
Out[27]:
In [28]:
a_third_list[::-1]
Out[28]:
Concatenating lists is just the same as summing both lists:
In [29]:
a_list = ['pink', 'orange']
a_concatenated_list = a_third_list + a_list
a_concatenated_list
Out[29]:
In [30]:
reverted = a_third_list.reverse()
## comment out the next lines to test the error:
#a_concatenated_list = a_third_list + reverted
#a_concatenated_list
The list itself is reversed and no output is returned, so reverted is None, which can not be added to a list
In [31]:
# Repeating lists
a_repeated_list = a_concatenated_list*10
print(a_repeated_list)
List comprehensions
List comprehensions are a very powerful functionality. It creates an in-list for-loop option, looping through all the elements of a list and doing an action on it, in a single, readable line.
In [32]:
number_list = [1, 2, 3, 4]
[i**2 for i in number_list]
Out[32]:
and with conditional options:
In [33]:
[i**2 for i in number_list if i>1]
Out[33]:
In [34]:
[i**2 for i in number_list if i>1]
Out[34]:
In [35]:
# Let's try multiplying with two on a list of strings:
print([i*2 for i in a_repeated_list])
Cool, this works! let's check more about strings:
Different string syntaxes (simple, double or triple quotes)
In [36]:
s = 'Never gonna give you up'
print(s)
s = "never gonna let you down"
print(s)
s = '''Never gonna run around
and desert you'''
print(s)
s = """Never gonna make you cry,
never gonna say goodbye"""
print(s)
In [37]:
## pay attention when using apostrophes! - test out the next two lines one at a time
#print('Hi, what's up?')
#print("Hi, what's up?")
The newline character is \n, and the tab character is \t.
In [38]:
print('''Never gonna tell a lie and hurt you.
Never gonna give you up,\tnever gonna let you down
Never \ngonna\n run around and\t desert\t you''')
Strings are collections like lists. Hence they can be indexed and sliced, using the same syntax and rules.
In [39]:
a_string = "hello"
print(a_string[0])
print(a_string[1:5])
print(a_string[-4:-1:2])
Accents and special characters can also be handled in Unicode strings (see http://docs.python.org/tutorial/introduction.html#unicode-strings).
In [40]:
print(u'Hello\u0020World !')
A string is an immutable object and it is not possible to modify its contents. One may however create new strings from the original one.
In [41]:
#a_string[3] = 'q' # uncomment this cell
We won't introduce all methods on strings, but let's check the namespace and apply a few of them:
In [42]:
#dir(str) # uncomment this cell
In [43]:
another_string = "Strawberry-raspBerry pAstry package party"
another_string.lower().replace('r', 'l', 7)
Out[43]:
String formatting to make the output as wanted can be done as follows:
In [44]:
print('An integer: %i; a float: %f; another string: %s' % (1, 0.1, 'string'))
The format string print options in python 3 are able to interpret the conversions itself:
In [45]:
print('An integer: {}; a float: {}; another string: {}'.format(1, 0.1, 'string'))
In [46]:
n_dataset_number = 20
sFilename = 'processing_of_dataset_%d.txt' % n_dataset_number
print(sFilename)
In [47]:
[el for el in dir(list) if not el[0]=='_']
Out[47]:
In [48]:
sentence = "the quick brown fox jumps over the lazy dog"
In [49]:
#split in words and get word lengths
[len(word) for word in sentence.split()]
Out[49]:
A dictionary is basically an efficient table that maps keys to values. It is an unordered container
It can be used to conveniently store and retrieve values associated with a name
In [50]:
# Always key : value combinations, datatypes can be mixed
hourly_wage = {'Jos':10, 'Frida': 9, 'Gaspard': '13', 23 : 3}
hourly_wage
Out[50]:
In [51]:
hourly_wage['Jos']
Out[51]:
Adding an extra element:
In [52]:
hourly_wage['Antoinette'] = 15
hourly_wage
Out[52]:
You can get the keys and values separately:
In [53]:
hourly_wage.keys()
Out[53]:
In [54]:
hourly_wage.values()
Out[54]:
In [55]:
hourly_wage.items() # all combinations in a list
Out[55]:
In [56]:
# ignore this loop for now, this will be explained later
for key, value in hourly_wage.items():
print(key,' earns ', value, '€/hour')
In [57]:
hourly_wage = {'Jos':10, 'Frida': 9, 'Gaspard': '13', 23 : 3}
In [58]:
str_key = []
for key in hourly_wage.keys():
str_key.append(str(key))
str_key
Out[58]:
Tuples are basically immutable lists. The elements of a tuple are written between parentheses, or just separated by commas
In [59]:
a_tuple = (2, 3, 'aa', [1, 2])
a_tuple
Out[59]:
In [60]:
a_second_tuple = 2, 3, 'aa', [1,2]
a_second_tuple
Out[60]:
the key concept here is mutable vs. immutable