A brief tutorial of basic python

From the wikipedia: "Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale."

Through this tutorial, students will learn some basic characteristics of the Python programming language, that will be useful for working with corpuses of text data.

1. Introduction to Strings

Among the different native python types, we will focus on strings, since they will be the core type that we will recur to represent text. Essentially, a string is just a concatenation of characters.


In [ ]:
str1 = '"Hola" is how we say "hello" in Spanish.'
str2 = "Strings can also be defined with quotes; try to be sistematic."

It is easy to check the type of a variable with the type() command:


In [ ]:
print str1
print type(str1)
print type(3)
print type(3.)

The following commands implement some common operations with strings in Python. Have a look at them, and try to deduce what the result of each operation will be. Then, execute the commands and check what are the actual results.


In [ ]:
print str1[0:5]

In [ ]:
print str1+str2

In [ ]:
print str1.lower()

In [ ]:
print str1.upper()

In [ ]:
print len(str1)

In [ ]:
print str1.replace('h','H')

In [ ]:
str3 = 'This is a question'
str3 = str3.replace('i','o')
str3 = str3.lower()
print str3[0:3]

It is interesting to notice the difference in the use of commands 'lower' and 'len'. Python is an object-oriented language, and str1 is an instance of the Python class 'string'. Then, str1.lower() invokes the method lower() of the class string to which object str1 belongs, while len(str1) or type(str1) imply the use of external methods, not belonging to the class string. In any case, we will not pay (much) attention to these issues during the session.

Finally, we remark that there exist special characters that require special consideration. Apart from language-oriented characters or special symbols (e.g., \euro), the following characters are commonly used to denote carriage return and the start of new lines


In [ ]:
print 'This is just a carriage return symbol.\r This sentence will overwrite the previous text.'

In [ ]:
print 'If you wish to start a new line,\r\nthe line feed character should also be used.'

In [ ]:
print 'But note that most applications are tolerant\nto the use of \'line feed\' only.'

2. Working with Python lists

Python lists are containers that hold a number of other objects, in a given order. To create a list, just put different comma-separated values between square brackets


In [ ]:
list1 = ['student', 'teacher', 1997, 2000]
print list1
list2 = [1, 2, 3, 4, 5 ]
print list2
list3 = ["a", "b", "c", "d"]
print list3

To check the value of a list element, indicate between brackets the index (or indices) to obtain the value (or values) at that position (positions).

Run the code fragment below, and try to guess what the output of each command will be.

Note: Python indexing starts from 0 and the last value in a range is not used!!!!


In [ ]:
print list1[0]
print list2[2:4]
print list3[-1]    # negative indices must be avoided, just be aware that Python may not produce an error

To add elements in a list you can use the method append() and to remove them the method remove(). Lists can also be directly added, with the same effect as appending.


In [ ]:
list1 = ['student', 'teacher', 1997, 2000]
list1.append(3)
print list1
list1.remove('teacher')
print list1
print list1 + list2
print list1 + ['end of list']

Other useful functions are:

len(list): Gives the number of elements in a list.    
max(list): Returns item from the list with max value.  
min(list): Returns item from the list with min value.

In [ ]:
list2 = [1, 2, 3, 4, 5 ]
print len(list2)
print max(list2)
print min(list2)

3. Flow control (with 'for' and 'if')

As in other programming languages, python offers mechanisms to loop through a piece of code several times, or for conditionally executing a code fragment when certain conditions are satisfied.

For conditional execution, you can use the if, elif and else statements.

Try to play with the following example:


In [ ]:
x = int(raw_input("Please enter an integer: "))
if x < 0:
    x = 0
    print 'Negative changed to zero'
elif x == 0:
    print 'Zero'
elif x == 1:
    print 'One'
else:
    print 'More than one'

Indentation

The above fragment, allows us also to discuss some important characteristics of the Python language syntaxis:

  • Unlike other languages, Python does not require to use the 'end' keyword to indicate that a given code fragment finishes. Instead, Python recurs to indentation.

  • Indentation in Python is mandatory. A block is composed by statatements indected at same level and if it constains a nested block it is simply indented further to the right.

  • As a convention each indentation consists of 4 spaces (for each level of indentation).

  • The condition lines conclude with ':', which are then followed by the indented blocks that will be executed only when the indicated conditions are satisfied.

The statement for lets you iterate over the items of any sequence (a list or a string), in the order that they appear in the sequence


In [ ]:
words = ['cat', 'window', 'open-course']
for w in words:
     print w, len(w)

In combination with enumerate(), you can iterate over the elementes of the sequence and have an index over them


In [ ]:
words = ['cat', 'window', 'open-course']
for (i, w) in enumerate(words):
     print 'element ' + str(i) + ' is ' + w

 4. Variables and assignments

In python the equal "=" sign in the assignment shouldn't be seen as "is equal to". It should be "read" or interpreted as "is set to". Let's see an example:


In [ ]:
x = 42
y = x
y = 50
print x 
print y

The first two lines do not seem problematic. But when y is set to 50, what will happen to the value of x? C programmers will assume that x will be changed to 50 as well, because we said before that y "points" to the location of x. But this is not a C-pointer. Because x and y will not share the same value anymore, y gets his or her own memory location, containing 50 and x sticks to 42.

If you are not a C programmer, the observable results of the assignments answer our expectations. But it can be problematic, if we copy mutable objects like lists and dictionaries.

Python creates real copies only if it has to, i.e. if the user, the programmer, explicitly demands it.

Let's see a couple of examples:


In [ ]:
colors1 = ["red", "green"]
colors2 = colors1
colors2 = ["rouge", "vert"]
print colors1

Ok. This is what we expected, colors1 is keeping its own values. Let's change one element of colors2 now:


In [ ]:
colors1 = ["red", "green"]
colors2 = colors1
colors2[1] = "blue"
print colors1

Ouch! That wasn't expected.

The explanation is that there has been no new assignment to colors2, colors2 still points to colors1. Only one of its elements, and consequently an element of colors1 has been changed.

It is possible to completely copy shallow list structures with the slice operator without having any of the side effects, which we have described above. But that will be a problem when having nested lists. In that case you should use import the module copy.


In [ ]:
list1 = ['a','b','c','d']
list2 = list1[:]
list2[1] = 'x'
print list2
print list1
['a', 'b', 'c', 'd']
list3 = ['a','b',['ab','ba']]
list4 = list3[:]
list4[0] = 'c'
list4[2][1] = 'd'
print(list3)

Conclusion: Be very careful when copying a list or a dictionary.

 5. Functions, arguments and scopes

In python a function is defined using the keyword def. As usual, they can take aguments and return results. Let's see an example:


In [ ]:
def my_sqrt(number):
    """Computes the square root of a number."""
    return number ** (0.5) #  In python ** is exponentiation (^ in other languages) 

x = my_sqrt(2)
print x

As we said, you must define a function using def, then the name of the function and in brackets ( ) the list of arguments of the function. The function will not return anything unless you specify it with a return statement. The expresion under the name of the function in triple quotes is a Documentation string or DOCSTRING and, as expected, it is used to document your code. For example, it is printed if you type the help command:


In [ ]:
help(my_sqrt)

Another interesting feature of python is that you can give default values to arguments in a function. For example, in the following code, when the second argument is not used during the call its value is 2.


In [ ]:
def nth_root(base, exp=2):
    """Computes the nth root of a number."""
    return base ** (1.0/exp) #  In python ** is exponentiation (^ in other languages) 

print nth_root(10000)
print nth_root(10000,4)

One tricky feature in python is how it evaluates the arguments that you pass to a function call. The most common evaluation strategies when passing arguments to a function have been call-by-value and call-by-reference. Python uses a mixture of these two, which is known as "Call-by-Object", sometimes also called "Call by Object Reference" or "Call by Sharing". Let's see it with an example:


In [ ]:
def add_square_to_list(x, my_list, dummy_list):
    x = x ** 2
    my_list.append(x)
    dummy_list = ["I", "am", "not", "a" , "dummy", "list"]
    
x = 5
my_list =[4, 9, 16]
dummy_list = ["I", "am", "a" , "dummy", "list"]

add_square_to_list(x, my_list, dummy_list)
print x
print my_list
print dummy_list

If you pass immutable arguments like integers, strings or tuples to a function, the passing acts like call-by-value. The object reference is passed to the function parameters. They can't be changed within the function, because they can't be changed at all, i.e. they are immutable.

It's different, if we pass mutable arguments. They are also passed by object reference, but they can be changed in place in the function. If we pass a list to a function, we have to consider two cases: Elements of a list can be changed in place, i.e. the list will be changed even in the caller's scope. If a new list is assigned to the name, the old list will not be affected, i.e. the list in the caller's scope will remain untouched.

So, be careful when modifying lists inside functions, and its side effects.

6. File input and output operations

First of all, you need to open a file with the open() function.


In [ ]:
f = open('workfile', 'wb')

The first argument is a string containing the filename. The second argument defines the mode in which the file will be used:

'r' : only to be read. If the file does not exist, it raises an error.
'w' : for only writing. If the file does not exist, it creates it (an existing file with the same name would be erased).
'a' : the file is opened for appending; any data written to the file is automatically appended to the end. 
'r+': opens the file for both reading and writing. 

If the mode argument is not included, 'r' will be assumed.

Use f.write(string) to write the contents of a string to the file. When you are done, do not forget to close the file:


In [ ]:
f.write('This is a test\n with 2 lines')
f.close()

To read the content of a file, use the function f.read():


In [ ]:
f2 = open('workfile', 'r')
text=f2.read()
f2.close()
print text

You can also read line by line from the file identifier


In [ ]:
f2 = open('workfile', 'r')
for line in f2:
    print line

f2.close()

7. Modules import

Python lets you define modules which are files consisting of Python code. A module can define functions, classes and variables.

Most Python distributions already include the most popular modules with predefined libraries which make our programmer lifes easier. Some well-known libraries are: time, sys, os, numpy, ...

There are several ways to import a library:

1) Import all the contents of the library: import lib_name

Note: You have to call these methods as part of the library


In [ ]:
import time
import sys

print time.time()  # returns the current processor time in seconds
Nwait = 2

print "Waiting %d seconds..." % Nwait
sys.stdout.flush()  # Try to execute commenting this line...

time.sleep(2) # suspends execution for the given number of seconds
print time.time() # returns the current processor time in seconds again!!!
print "Done!"

2) Define a short name to use the library: import lib_name as lib


In [ ]:
import time as t
print t.time()

3) Import only some elements of the library

Note: now you have to use the methods directly


In [ ]:
from time import time, sleep
print time()

 8. Exercise

Program a function primes that returns a list containing the N first prime numbers. Then call that function with the value 1000 and save this list in a .txt file with one number per line.

The answer should be:


These are the first 1000 prime numbers:

[1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069, 1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123, 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 1217, 1223, 1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283, 1289, 1291, 1297, 1301, 1303, 1307, 1319, 1321, 1327, 1361, 1367, 1373, 1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439, 1447, 1451, 1453, 1459, 1471, 1481, 1483, 1487, 1489, 1493, 1499, 1511, 1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571, 1579, 1583, 1597, 1601, 1607, 1609, 1613, 1619, 1621, 1627, 1637, 1657, 1663, 1667, 1669, 1693, 1697, 1699, 1709, 1721, 1723, 1733, 1741, 1747, 1753, 1759, 1777, 1783, 1787, 1789, 1801, 1811, 1823, 1831, 1847, 1861, 1867, 1871, 1873, 1877, 1879, 1889, 1901, 1907, 1913, 1931, 1933, 1949, 1951, 1973, 1979, 1987, 1993, 1997, 1999, 2003, 2011, 2017, 2027, 2029, 2039, 2053, 2063, 2069, 2081, 2083, 2087, 2089, 2099, 2111, 2113, 2129, 2131, 2137, 2141, 2143, 2153, 2161, 2179, 2203, 2207, 2213, 2221, 2237, 2239, 2243, 2251, 2267, 2269, 2273, 2281, 2287, 2293, 2297, 2309, 2311, 2333, 2339, 2341, 2347, 2351, 2357, 2371, 2377, 2381, 2383, 2389, 2393, 2399, 2411, 2417, 2423, 2437, 2441, 2447, 2459, 2467, 2473, 2477, 2503, 2521, 2531, 2539, 2543, 2549, 2551, 2557, 2579, 2591, 2593, 2609, 2617, 2621, 2633, 2647, 2657, 2659, 2663, 2671, 2677, 2683, 2687, 2689, 2693, 2699, 2707, 2711, 2713, 2719, 2729, 2731, 2741, 2749, 2753, 2767, 2777, 2789, 2791, 2797, 2801, 2803, 2819, 2833, 2837, 2843, 2851, 2857, 2861, 2879, 2887, 2897, 2903, 2909, 2917, 2927, 2939, 2953, 2957, 2963, 2969, 2971, 2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061, 3067, 3079, 3083, 3089, 3109, 3119, 3121, 3137, 3163, 3167, 3169, 3181, 3187, 3191, 3203, 3209, 3217, 3221, 3229, 3251, 3253, 3257, 3259, 3271, 3299, 3301, 3307, 3313, 3319, 3323, 3329, 3331, 3343, 3347, 3359, 3361, 3371, 3373, 3389, 3391, 3407, 3413, 3433, 3449, 3457, 3461, 3463, 3467, 3469, 3491, 3499, 3511, 3517, 3527, 3529, 3533, 3539, 3541, 3547, 3557, 3559, 3571, 3581, 3583, 3593, 3607, 3613, 3617, 3623, 3631, 3637, 3643, 3659, 3671, 3673, 3677, 3691, 3697, 3701, 3709, 3719, 3727, 3733, 3739, 3761, 3767, 3769, 3779, 3793, 3797, 3803, 3821, 3823, 3833, 3847, 3851, 3853, 3863, 3877, 3881, 3889, 3907, 3911, 3917, 3919, 3923, 3929, 3931, 3943, 3947, 3967, 3989, 4001, 4003, 4007, 4013, 4019, 4021, 4027, 4049, 4051, 4057, 4073, 4079, 4091, 4093, 4099, 4111, 4127, 4129, 4133, 4139, 4153, 4157, 4159, 4177, 4201, 4211, 4217, 4219, 4229, 4231, 4241, 4243, 4253, 4259, 4261, 4271, 4273, 4283, 4289, 4297, 4327, 4337, 4339, 4349, 4357, 4363, 4373, 4391, 4397, 4409, 4421, 4423, 4441, 4447, 4451, 4457, 4463, 4481, 4483, 4493, 4507, 4513, 4517, 4519, 4523, 4547, 4549, 4561, 4567, 4583, 4591, 4597, 4603, 4621, 4637, 4639, 4643, 4649, 4651, 4657, 4663, 4673, 4679, 4691, 4703, 4721, 4723, 4729, 4733, 4751, 4759, 4783, 4787, 4789, 4793, 4799, 4801, 4813, 4817, 4831, 4861, 4871, 4877, 4889, 4903, 4909, 4919, 4931, 4933, 4937, 4943, 4951, 4957, 4967, 4969, 4973, 4987, 4993, 4999, 5003, 5009, 5011, 5021, 5023, 5039, 5051, 5059, 5077, 5081, 5087, 5099, 5101, 5107, 5113, 5119, 5147, 5153, 5167, 5171, 5179, 5189, 5197, 5209, 5227, 5231, 5233, 5237, 5261, 5273, 5279, 5281, 5297, 5303, 5309, 5323, 5333, 5347, 5351, 5381, 5387, 5393, 5399, 5407, 5413, 5417, 5419, 5431, 5437, 5441, 5443, 5449, 5471, 5477, 5479, 5483, 5501, 5503, 5507, 5519, 5521, 5527, 5531, 5557, 5563, 5569, 5573, 5581, 5591, 5623, 5639, 5641, 5647, 5651, 5653, 5657, 5659, 5669, 5683, 5689, 5693, 5701, 5711, 5717, 5737, 5741, 5743, 5749, 5779, 5783, 5791, 5801, 5807, 5813, 5821, 5827, 5839, 5843, 5849, 5851, 5857, 5861, 5867, 5869, 5879, 5881, 5897, 5903, 5923, 5927, 5939, 5953, 5981, 5987, 6007, 6011, 6029, 6037, 6043, 6047, 6053, 6067, 6073, 6079, 6089, 6091, 6101, 6113, 6121, 6131, 6133, 6143, 6151, 6163, 6173, 6197, 6199, 6203, 6211, 6217, 6221, 6229, 6247, 6257, 6263, 6269, 6271, 6277, 6287, 6299, 6301, 6311, 6317, 6323, 6329, 6337, 6343, 6353, 6359, 6361, 6367, 6373, 6379, 6389, 6397, 6421, 6427, 6449, 6451, 6469, 6473, 6481, 6491, 6521, 6529, 6547, 6551, 6553, 6563, 6569, 6571, 6577, 6581, 6599, 6607, 6619, 6637, 6653, 6659, 6661, 6673, 6679, 6689, 6691, 6701, 6703, 6709, 6719, 6733, 6737, 6761, 6763, 6779, 6781, 6791, 6793, 6803, 6823, 6827, 6829, 6833, 6841, 6857, 6863, 6869, 6871, 6883, 6899, 6907, 6911, 6917, 6947, 6949, 6959, 6961, 6967, 6971, 6977, 6983, 6991, 6997, 7001, 7013, 7019, 7027, 7039, 7043, 7057, 7069, 7079, 7103, 7109, 7121, 7127, 7129, 7151, 7159, 7177, 7187, 7193, 7207, 7211, 7213, 7219, 7229, 7237, 7243, 7247, 7253, 7283, 7297, 7307, 7309, 7321, 7331, 7333, 7349, 7351, 7369, 7393, 7411, 7417, 7433, 7451, 7457, 7459, 7477, 7481, 7487, 7489, 7499, 7507, 7517, 7523, 7529, 7537, 7541, 7547, 7549, 7559, 7561, 7573, 7577, 7583, 7589, 7591, 7603, 7607, 7621, 7639, 7643, 7649, 7669, 7673, 7681, 7687, 7691, 7699, 7703, 7717, 7723, 7727, 7741, 7753, 7757, 7759, 7789, 7793, 7817, 7823, 7829, 7841, 7853, 7867, 7873, 7877, 7879, 7883, 7901, 7907]

In [ ]:
def is_prime(x):
   <COMPLETAR>
    return isprime

def primes(N):
    <COMPLETAR>
    return list_of_primes

list_primes = primes(1000)

print "These are the first %d prime numbers:\n" % len(list_primes)

print list_primes