Python tutorial

These first steps are not very methodical. Programming language classes take time, start with basic concepts and gradually improve and expand on them. Universities offer more relaxed programming classes for students, but this is a different type of learning. Students have time for many things, for the rest of us working people there are two or perhaps three steps to learning a new language or a new library, and we learn by doing. The tutorial is a simple exposure that takes you through some key aspects, next comes the programming guide (or any decent book) where concepts are explained in greater detail and at last there is the reference library, where every detail is supposed to be documented in concise format. Good programmers learn to read the tutorial, read some key aspects from the guide that make their library stand out and only check the reference guide when needed. Beginners in programming have a hard time knowing where to begin and are overwhelmed.

No matter how good I would become at teaching basic Python in one or two hours, it fails to compare with reading the default tutorial, which would take much longer time than we have at our disposal. So please understand that there is a trade-off between time and quality. As we learn by doing I would like to invite you to check whenever possible the documentation for Python and for the libraries we are using.

Basics

Variables and comments

Variable vs type. 'Native' datatypes. Console output.


In [3]:
# This is a line comment.
"""
A multi-line
comment.
"""
a = None #Just declared an empty object
print(a)
a = 1
print(a)
a = 'abc'
print(a)
a = [a, 2, b, 1., 1.2e-5, True] #This is a list.
print(a)


None
1
1 abc [1, 2, 'abc', 1.0, 1.2e-05, True]

In [1]:
## Python is a dynamic language
a = 1
print(type(a))
print(a)
a = "spam"
print(type(a))
print(a)


<class 'int'>
1
<class 'str'>
spam

In [1]:
a = 1
a
b = 'abc'
print(b)
#b


abc

In [12]:
b


Out[12]:
'abc'

Now let us switch the values of two variables.


In [4]:
print(a, b, c)
t = c
c = b
b = t
print(a, b, c)


1 abc [1, 2, 'abc', 1.0, 1.2e-05, True]
1 [1, 2, 'abc', 1.0, 1.2e-05, True] abc

Math operations

Arithmetic


In [4]:
a = 2
b = 1
b = a*(5 + b) + 1/0.5
print(b)
d = 1/a
print(d)


14.0
0.5

Logical operations:


In [11]:
a = True
b = 3
print(b == 5)
print(a == False)
print(b < 6 and not a)
print(b < 6 or not a)
print(b < 6 and (not a or not b == 3))


False
False
False
True
False

In [14]:
print(False and True)


False

In [7]:
True == 1


Out[7]:
True

Functions

Functions are a great way to separate code into readable chunks. The exact size and number of functions needed to solve a problem will affect readability.

New concepts: indentation, namespaces, global and local scope, default parameters, passing arguments by value or by reference is meaningless in Python, what are mutable and imutable types?


In [17]:
## Indentation and function declaration, parameters of a function
def operation(a, b):
    c = 2*(5 + b) + 1/0.5
    a = 1
    return a, c

a = None
mu = 2
operation(mu, 1)
a, op = operation(a, 1)
print(a, op)


1 14.0

In [20]:
# Function scope, program workflow
def f(a):
    print("inside the scope of f():")
    a = 4
    print("a =", a)
    return a
a = 1
print("f is called")
f(a)
print("outside the scope of f, a=", a)
print("also outside the scope of f, f returns", f(a))


f is called
inside the scope of f():
a = 4
outside the scope of f, a= 1
inside the scope of f():
a = 4
also outside the scope of f, f returns 4

In [9]:
## Defining default parameters for a function
def f2(a, b=1):
    return a + b

print(f2(5))
print(f2(5, b=2))


6
7

Task:

  • Define three functions, f, g and h. Call g and h from inside f. Run f on some value v.
  • You can also have functions that are defined inside the namespace of another function. Try it!

Mutable and immutable data types

Data in Python is either mutable or immutable. This is a source of permanent errors. Always know if you operate with a mutable or immutable data type.

Question:

  • Why weren't all data types made mutable only, or immutable only?

In [37]:
## Strings of characters are immutable, x did not changed its value
x = 'foo'
y = x
print(x, y) # foo
y += 'bar'
print(x, y) # foo


foo foo
foo foobar

In [52]:
## A list however is mutable datatype in Python
x = [1, 2, 3]
y = x
print(x, y) # [1, 2, 3]
y += [3, 2, 1]
print(x, y) # [1, 2, 3, 3, 2, 1]


[1, 2, 3] [1, 2, 3]
[1, 2, 3, 3, 2, 1] [1, 2, 3, 3, 2, 1]

In [10]:
## String mutable? No
def func(val):
    val += 'bar'
    return val

x = 'foo'
print(x) # foo
print(func(x))
print(x) # foo


foo
foobar
foo

In [13]:
## List mutable? Yes.
def func(val):
    val += [3, 2, 1]
    return val

x = [1, 2, 3]
print(x) # [1, 2, 3]
print(func(x))
print(x) # [1, 2, 3, 3, 2, 1]


[1, 2, 3]
[1, 2, 3, 3, 2, 1]
[1, 2, 3, 3, 2, 1]

In [14]:
## Globals. Never use it, always abuse it!
g = 0

def f1():
    # Comment bellow to spot the diference
    global g # Needed to modify global copy of g
    g = 1

def f2():
    print("f2:",g)

print(g)
f1()
print(g)
f2()


0
1
f2: 1

Control flow

There are two major types of programming languages, procedural and functional. Python is mostly procedural, with very simple functional elements. Procedural languages typicaly have very strong control flow specifications. Programmers spend time specifying how a program should run. In functional languages the time is spent defining the program while how to run it is left to the computer. Scala is the most used functional language in Bioinformatics.


In [54]:
# for loops
for b in [1, 2, 3]:
    print(b)


1
2
3

In [12]:
# while, break and continue
b = 0
while b < 10:
    b += 1
    a = 2
    if b%a == 0:
        #break
        continue
    print(b)

# Now do the same, but using the for loop


1
3
5
7
9

In [56]:
## if else: use different logical operators and see if it makes sense
a = 1
if a == 3:
    print('3')
elif a == 4:
    print('4')
else:
    print('something else..')


something else..

In [64]:
## error handling - use sparingly!
## python culture: better to apologise than to verify!
def divide(x, y):
    """catches an exception"""
    try:
        result = x / y
    except ZeroDivisionError:
        print("division by zero!")
        #raise ZeroDivisionError
        #pass
    else:
        print("result is", result)
    finally:
        print("executing finally code block..")
divide(1,0)


division by zero!
executing finally code block..

Python modules

import xls
"How can you simply import Excel !?!"
  • How Python is structured:

Packages are the way code libraries are distributed. Libraries contain one or several modules. Each module can contain object classes, functions and submodules.

  • Object introspection.

It happens often that some Python code that you require is not well documented. To understand how to use the code one can interogate any object during runtime. Aditionally the code is always located somewhere on your computer.


In [13]:
import math
print(dir())
print(dir(math))
print(help(math.log))
a = 3
print(type(a))
import numpy
print(numpy.__version__)
import os
print(os.getcwd())


['In', 'Out', '_', '_1', '_7', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i11', '_i12', '_i13', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_i8', '_i9', '_ih', '_ii', '_iii', '_oh', '_sh', 'a', 'b', 'c', 'd', 'exit', 'func', 'get_ipython', 'math', 'quit', 't', 'x']
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
Help on built-in function log in module math:

log(...)
    log(x[, base])
    
    Return the logarithm of x to the given base.
    If the base not specified, returns the natural logarithm (base e) of x.

None
<class 'int'>
1.12.1
/home/sergiu/data/work/course/short/github/biopycourse/day1

Task:

  • Compute the distance between 2D points.
  • d(p1, p2)=sqrt((x1-x2)**2+(y1-y2)**2), where pi(xi,yi)
  • Define a module containing a function that computes the euclidian distance. Use the Spyder code editor and save the module on your filesystem.
  • Import that module into a new code cell bellow.
  • Make the module location available to Jupyter.

In [32]:
"""
%run full(relative)path/distance.py

or
os.setcwd(path)

"""
import distance
print(distance.euclidian(1, 2, 4.5 , 6))

from distance import euclidian
print(euclidian(1, 2, 4.5 , 6))

import distance as d
print(d.euclidian(1, 2, 4.5 , 6))


5.315072906367325
5.315072906367325
5.315072906367325

In [17]:
import sys
print(sys.path)
sys.path.append('/my/custom/path')
print(sys.path)


['', '/home/sergiu/programs/miniconda3/envs/lts/lib/python36.zip', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/lib-dynload', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/site-packages', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/site-packages/IPython/extensions', '/home/sergiu/.ipython']
['', '/home/sergiu/programs/miniconda3/envs/lts/lib/python36.zip', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/lib-dynload', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/site-packages', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg', '/home/sergiu/programs/miniconda3/envs/lts/lib/python3.6/site-packages/IPython/extensions', '/home/sergiu/.ipython', '/my/custom/path']

Data representation

Strings


In [33]:
#String declarations
statement = "Gene IDs are great. My favorite gene ID is"
name = "At5G001024"
statement = statement + " " + name
print(statement)

statement2 = 'Genes names \n \'are great. My favorite gene name is ' + 'Afldtjahd'
statement3 = """
Gene IDs are great.
My favorite genes are {} and {}.""".format(name, 'ksdyfngusy')

print(statement2)
print(statement3)
print('.\n'.join(statement.split(". ")))
print (statement.split(". "))


Gene IDs are great. My favorite gene ID is At5G001024
Genes names 
 'are great. My favorite gene name is Afldtjahd

Gene IDs are great.
My favorite genes are At5G001024 and ksdyfngusy.
Gene IDs are great.
My favorite gene ID is At5G001024
['Gene IDs are great', 'My favorite gene ID is At5G001024']

In [84]:
#String methods
name = "At5G001024"
print(name.lower())
print(name.index('G00'))
print(name.rstrip('402'))
print(name.strip('Add34'))


at5g001024
3
At5G001
t5G00102

In [85]:
#Splits, joins
statement = "Gene IDs are great. My favorite gene ID is At5G001024"
words = statement.split()
print("Splitting a string:", words)
print("Joining into a string:", "\t ".join(words))
import random
random.shuffle(words)
print("Fun:", " ".join(words))


Splitting a string: ['Gene', 'IDs', 'are', 'great.', 'My', 'favorite', 'gene', 'ID', 'is', 'At5G001024']
Joining into a string: Gene	 IDs	 are	 great.	 My	 favorite	 gene	 ID	 is	 At5G001024
Fun: great. favorite are IDs gene Gene At5G001024 My is ID

In [86]:
#Strings are lists of characters!
print(statement)
print(statement[0:5] + " blabla " + statement[-10:-5])


Gene IDs are great. My favorite gene ID is At5G001024
Gene  blabla At5G0

Tuples

A few pros for tuples:

  • Tuples are faster than lists
  • Tuples can be keys to dictionaires (they are immutable types)

In [19]:
#a tupple is an immutable list
a = (1, "spam", 5)
#a.append("eggs")

print(a[1])
b = (1, "one")
c = (a, b, 3)
print(c)

#unpacking a collection into positional arguments
def sum(a, b):
    return a + b
values = (5, 2)
s = sum(*values)
print(s)


spam
((1, 'spam', 5), (1, 'one'), 3)
7

Lists


In [88]:
a = [1,"one",(2,"two")]
print(a[0])
print(a)
a.append(3)
print(a)
b =  a + a[:2]
print(b)


1
[1, 'one', (2, 'two')]
[1, 'one', (2, 'two'), 3]
[1, 'one', (2, 'two'), 3, 1, 'one']

In [89]:
## slicing and indexing
print(b[2:5])
del a[-1]
print(a)
print(a.index("one"))
print(len(a))


[(2, 'two'), 3, 1]
[1, 'one', (2, 'two')]
1
3

In [92]:
## not just list size but list elements too are scoping free! (list is mutable)
def f(a, b):
    a[1] = "changed"
    b = [1,2]
    return
a = [(2, 'two'), 3, 1]
b = [2, "two"]
f(a, b)
print(a, b)


[(2, 'two'), 'changed', 1] [2, 'two']

In [36]:
## matrix
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]]

print(matrix)
print(matrix[0][1])
print(list(range(2,10,3)))
for x in range(len(matrix)):
    for y in range(len(matrix[x])):
        print(x,y, matrix[x][y])


[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
2
[2, 5, 8]
0 0 1
0 1 2
0 2 3
0 3 4
1 0 5
1 1 6
1 2 7
1 3 8
2 0 9
2 1 10
2 2 11
2 3 12

In [94]:
## ranges
r = range(0, 5)
for i in r: print("step", i)


step 0
step 1
step 2
step 3
step 4

In [3]:
## list comprehensions
def f(i):
    return 2*i
a = [2*i for i in range(10)]
a = [f(i) for i in range(10)]
print(a)
b = [str(e) for e in a[4:] if e%3==0]
print(b)


[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
['12', '18']

In [2]:
## sorting a list of tupples
a = [(str(i), str(j)) for i in a for j in range(3)]
print(a)
a.sort(key=lambda tup: tup[1])
a.sort(key=lambda tup: len(tup[1]), reverse = True)
print(a)


[('s', '0'), ('s', '1'), ('s', '2'), ('p', '0'), ('p', '1'), ('p', '2'), ('a', '0'), ('a', '1'), ('a', '2'), ('m', '0'), ('m', '1'), ('m', '2')]
[('s', '0'), ('p', '0'), ('a', '0'), ('m', '0'), ('s', '1'), ('p', '1'), ('a', '1'), ('m', '1'), ('s', '2'), ('p', '2'), ('a', '2'), ('m', '2')]

In [40]:
#zipping and enumerating

y = zip('abc', 'def')
print(list(y)) # y is a generator
print(list(y)) # second cast to list, content is empty!

print(list(zip(['one', 'two', 'three'], [1, 2, 3])))

x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
#print(type(zipped))
print(zipped)

x2, y2 = zip(*zipped)
print (x == list(x2) and y == list(y2))
print (x2, y2)

alist = ['a1', 'a2', 'a3']
for i, e in enumerate(alist): print (i, e) #this is called a one liner
for i in range(len(alist)):
    print(i, alist[i])
print(list(range(len(alist))))


[('a', 'd'), ('b', 'e'), ('c', 'f')]
[]
[('one', 1), ('two', 2), ('three', 3)]
<zip object at 0x7f56500f4fc8>
True
(1, 2, 3) (4, 5, 6)
0 a1
1 a2
2 a3
0 a1
1 a2
2 a3
[0, 1, 2]

In [108]:
# mapping
a = [1, 2, 3, 4, 5]
b = [2, 2, 9, 0, 9]
print(list(map(lambda x: max(x), zip(a, b))))
print(list(zip(a, b)))


[2, 2, 9, 4, 9]
[(1, 2), (2, 2), (3, 9), (4, 0), (5, 9)]

In [110]:
# deep and shallow copies on mutable objects or collections of mutable objects
lst1 = ['a','b',['ab','ba']]
lst2 = lst1 #this is a shallow copy of the entire list
lst2[0]='e'
print(lst1)

lst1 = ['a','b',['ab','ba']]
lst2 = lst1[:] #this is a shallow copy of each element
lst2[0] = 'e'
lst2[2][1] = 'd'
print(lst1)

from copy import deepcopy
lst1 = ['a','b',['ab','ba']]
lst2 = deepcopy(lst1) #this is a deep copy
lst2[2][1] = "d"
lst2[0] = "c";
print(lst2)
print(lst1)


['e', 'b', ['ab', 'ba']]
['a', 'b', ['ab', 'd']]
['c', 'b', ['ab', 'd']]
['a', 'b', ['ab', 'ba']]

Sets

Sets have no order and cannot include identical elements. Use them when the position of elements is not relevant. Finding elements is faster than in a list. Also set operations are more straightforward. A frozen set has a hash value.

Task:

  • Find on the Internet the official reference documentation for the Python sets

In [111]:
# set vs. frozenset
s = set()
#s = frozenset()
s.add(1)
s = s | set([2,"three"])
s |= set([2,"three"])
s.add(2)
s.remove(1)
print(s)
print("three" in s)


{2, 'three'}
True

In [113]:
s1 = set(range(10))
s2 = set(range(5,15))
s3 = s1 & s2
print(s1, s2, s3)
s3 = s1 - s2
print(s1, s2, s3)
print(s3 <= s1)
s3 = s1 ^ s2
print(s1, s2, s3)


{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} {5, 6, 7, 8, 9, 10, 11, 12, 13, 14} {8, 9, 5, 6, 7}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} {5, 6, 7, 8, 9, 10, 11, 12, 13, 14} {0, 1, 2, 3, 4}
True
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} {5, 6, 7, 8, 9, 10, 11, 12, 13, 14} {0, 1, 2, 3, 4, 10, 11, 12, 13, 14}

Dictionary

A set of key: value pairs. Keys must be hashable elements, values can be any Python datatype.


In [114]:
d = {'geneid9': 100, 'geneid8': 90, 'geneid7': 80, 'geneid6': 70, 'geneid5': 60, 'geneid4': 50}
d


Out[114]:
{'geneid4': 50,
 'geneid5': 60,
 'geneid6': 70,
 'geneid7': 80,
 'geneid8': 90,
 'geneid9': 100}

In [116]:
d = {}
d['geneid10'] = 110
d


Out[116]:
{'geneid10': 110}

In [117]:
#Creation: dict(list)
genes = ['geneid1', 'geneid2', 'geneid3']
values = [20, 30, 40]
d = dict(zip(genes, values))
print(d)


{'geneid1': 20, 'geneid3': 40, 'geneid2': 30}

In [16]:
#Creation: dictionary comprehensions
d2 = { 'geneid'+str(i):10*(i+1) for i in range(4, 10) }
print(d2)

#Keys and values
print(d2.keys())
print(d2.values())
for k in d2.keys(): print(k, d2[k])


{'geneid4': 50, 'geneid5': 60, 'geneid6': 70, 'geneid7': 80, 'geneid8': 90, 'geneid9': 100}
dict_keys(['geneid4', 'geneid5', 'geneid6', 'geneid7', 'geneid8', 'geneid9'])
dict_values([50, 60, 70, 80, 90, 100])
geneid4 50
geneid5 60
geneid6 70
geneid7 80
geneid8 90
geneid9 100

Task:

Find the dictionary key corresponding to a certain value. Why is Python not offering a native method for this?


In [122]:
d = {'geneid9': 100, 'geneid8': 90, 'geneid7': 90, 'geneid6': 70, 'geneid5': 60, 'geneid4': 50}

def getkey(value):
    ks = set()
    # .. your code here
    return ks

print(getkey(90))


set()

Objects and Classes

Everything is an object in Python and every variable is a reference to an object. References map the adress in memory where an object lies. However this is kept hidden in Python. C was famous for not cleaning up automatically the adress space after alocating memory for its data structures. This was causing memory leaks that makes some programs gain more and more RAM space. Modern languages cleanup dynamically after the scope of a variable ended, something called "garbage collecting". However this is afecting their speed of computation.

New concepts:

  • Instantiation, Fields, Methods, Decomposition into classes, Inheritance

In [4]:
class Dog(object):
    
    def __init__(self, name):
        self.name = name
        return
    
    def bark_if_called(self, call):
        if call[:-1]==self.name:
            print("Woof Woof!")
        else:
            print("*sniffs..")
        return
    
    def get_ball(self):
        print(self.name + " brings back ball")

d = Dog("Buffy")
print(d.name, "was created from Ether!") #name is an attribute

d.bark_if_called("Bambi!") #bark_if_called is a method
#dog.bark_if_called("Buffy!")


Buffy was created from Ether!
*sniffs..

In [135]:
class PitBull(Dog):
    
    def get_ball(self):
        super(PitBull, self).get_ball()
        print("*hates you")
        return
    
    def chew_boots(self):
        print("*drools")
        return

d2 = PitBull("Georgie")

d2.bark_if_called("Loopie!")
d2.bark_if_called("Georgie!")

d2.chew_boots()
#d.chew_boots()

d2.get_ball()
print(d2.name)


*sniff under tail..
Woof Woof!
*drools
Georgie brings back ball
*hates you
Georgie

Decorators


In [17]:
from time import sleep


def sleep_decorator(function):

    """
    Limits how fast the function is
    called.
    """

    def wrapper(*args, **kwargs):
        sleep(2)
        return function(*args, **kwargs)
    return wrapper


@sleep_decorator
def print_number(num):
    return num

print(print_number(222))

for num in range(1, 6):
    print(print_number(num))


222
1
2
3
4
5

Standard library modules

https://docs.python.org/3/library/

  • sys - system-specific parameters and functions
  • os - operating system interface
  • shutil - shell utilities
  • math - mathematical functions and constants
  • random - pseudorandom number generator
  • timeit - time it
  • format - number and text formating
  • zlib - file archiving
  • ... etc ...

Reccomendation: Take time to explore the Python module of the week. It is a very good way to learn why Python comes "with batteries included".

The sys module. Command line arguments.


In [137]:
import sys
print(sys.argv)
sys.exit()

##getopt, sys.exit()
##getopt.getopt(args, options[, long_options])
# import getopt
# try:
#     opts, args = getopt.getopt(sys.argv[1:],"hi:o:",["ifile=","ofile="])
# except getopt.GetoptError:
#     print 'test.py -i <inputfile> -o <outputfile>'
#     sys.exit(2)
# for opt, arg in opts:
#     if opt == '-h':
#         print 'test.py -i <inputfile> -o <outputfile>'
#         sys.exit()
#     elif opt in ("-i", "--ifile"):
#         inputfile = arg
#     elif opt in ("-o", "--ofile"):
#         outputfile = arg
# print inputfile, outputfile


['/home/sergiun/programs/anaconda3/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py', '-f', '/run/user/1000/jupyter/kernel-13c582f7-e031-4ca3-8c2e-ec3cc87d2d2c.json']
An exception has occurred, use %tb to see the full traceback.

SystemExit
To exit: use 'exit', 'quit', or Ctrl-D.

Task:

  • Create a second script that contains command line arguments and imports the distance module above. If an -n 8 is provided in the arguments, it must generate 8 random points and compute a matrix of all pair distances.

os module: File operations

The working directory, file IO, copy, rename and delete


In [139]:
import os
print(os.getcwd())
#os.chdir(newpath)
os.system('mkdir testdir')

f = open('testfile.txt','wt')
f.write('One line of text\n')
f.write('Another line of text\n')
f.close()

import shutil
#shutil.copy('testfile.txt', 'testdir/')
shutil.copyfile('testfile.txt', 'testdir/testfile1.txt')
shutil.copyfile('testfile.txt', 'testdir/testfile2.txt')

with open('testdir/testfile1.txt','rt') as f:
    for l in f: print(l)

for fn in os.listdir("testdir/"):
    print(fn)
    #fpath = os.path.join(dirpath,filename)
    os.rename('testdir/'+fn, 'testdir/file'+fn[-5]+'.txt')

import glob
print (glob.glob('testdir/*'))

os.remove('testdir/file2.txt')
#os.rmdir('testdir')
#shutil.rmtree(path)


/home/sergiun/projects/work/course
One line of text

Another line of text

testfile2.txt
testfile1.txt
['testdir/file1.txt', 'testdir/file2.txt']

Task:

  • Add a function to save the random vectors and the generated matrix into a file.

Timing


In [43]:
from datetime import datetime
startTime = datetime.now()

n = 10**8
for i in range(n):
    continue

print datetime.now() - startTime


0:00:06.661880

Processes

Launching a process, Paralellization: shared resources, clusters, clouds


In [46]:
import os

#print os.system('/path/yourshellscript.sh args')
subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
subprocess.run("exit 1", shell=True, check=True)

from subprocess import call
call(["ls", "-l"])


0

In [ ]:
args = ['/path/yourshellscript.sh', '-arg1', 'value1', '-arg2', 'value2']
p = Popen(args, shell=True, bufsize=bufsize, 
          stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
p.wait()
(child_stdin, child_stdout, child_stderr) = (p.stdin, p.stdout, p.stderr)

In [ ]:
# def child():
#    print 'A new child ',  os.getpid( )
#    os._exit(0)  

# def parent():
#    while True:
#       newpid = os.fork()
#       if newpid == 0:
#          child()
#       else:
#          pids = (os.getpid(), newpid)
#          print "parent: %d, child: %d" % pids
#       if raw_input( ) == 'q': break

# parent()

How to do the equivalent of shell piping in Python? This is the basic step of an automated pipeline.

cat test.txt | grep something

Task:

  • Test this!
  • Uncomment p1.stdout.close(). Why is it not working?
  • What are signals? Read about SIGPIPE.

In [ ]:
p1 = Popen(["cat", "test.txt"], stdout=PIPE)
p2 = Popen(["grep", "something"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()
output = p2.communicate()[0]

Questions:

  • What are the Python's native datatypes? Have a look at the Python online documentation for each datatype.
  • How many data types does Python have?
  • Python is a "dynamic" language. What does it mean?
  • Python is an "interpreted" language. What does it mean?
  • Which data strutures are mutable and which are immutable. When does this matters?
  • What is "hash" and how does it influences set and dictionary operations?
  • What are the most important Python libraries for you? Read through Anaconda's collection of libraries and check out some of them.

Task. Explain why this happens:


In [17]:
def run(l=[]):
    l.append(len(l))
    return l

print(run())
print(run())
print(run())


[0]
[0, 1]
[0, 1, 2]

Task.

dic = {'a':[1,2,3], 'b':[4,5,6,7]}

Using list comprehension, return:

[1, 2, 3, 4, 5, 6, 7] ['a', 'a', 'a', 'b', 'b', 'b', 'b']