From the wikipedia: "Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale."
Through this tutorial, students will learn some basic characteristics of the Python programming language, that will be useful for working with corpuses of text data.
Among the different native python types, we will focus on strings, since they will be the core type that we will recur to represent text. Essentially, a string is just a concatenation of characters.
In [1]:
str1 = '"Hola" is how we say "hello" in Spanish.'
str2 = "Strings can also be defined with quotes; try to be sistematic."
It is easy to check the type of a variable with the type() command:
In [2]:
print str1
print type(str1)
print type(3)
print type(3.)
The following commands implement some common operations with strings in Python. Have a look at them, and try to deduce what the result of each operation will be. Then, execute the commands and check what are the actual results.
In [3]:
print str1[0:5]
In [4]:
print str1+str2
In [5]:
print str1.lower()
In [6]:
print str1.upper()
In [7]:
print len(str1)
In [8]:
print str1.replace('h','H')
In [2]:
str= 'This is a question'
str.replace('i','o’)
str.lower()
print str[0:4]
It is interesting to notice the difference in the use of commands 'lower' and 'len'. Python is an object-oriented language, and str1 is an instance of the Python class 'string'. Then, str1.lower() invokes the method lower() of the class string to which object str1 belongs, while len(str1) or type(str1) imply the use of external methods, not belonging to the class string. In any case, we will not pay (much) attention to these issues during the session.
Finally, we remark that there exist special characters that require special consideration. Apart from language-oriented characters or special symbols (e.g., \euro), the following characters are commonly used to denote carriage return and the start of new lines
In [9]:
print 'This is just a carriage return symbol.\r Second line will start on top of the first line.'
In [10]:
print 'If you wish to start a new line,\r\nthe line feed character should also be used.'
In [11]:
print 'But note that most applications are tolerant\nto the use of \'line feed\' only.'
Python lists are containers that hold a number of other objects, in a given order. To create a list, just put different comma-separated values between square brackets
In [1]:
list1 = ['student', 'teacher', 1997, 2000]
print list1
list2 = [1, 2, 3, 4, 5 ]
print list2
list3 = ["a", "b", "c", "d"]
print list3
To check the value of a list element, indicate between brackets the index (or indices) to obtain the value (or values) at that position (positions).
Run the code fragment below, and try to guess what the output of each command will be.
Note: Python indexing starts from 0!!!!
In [7]:
print list1[0]
print list2[2:4]
print list3[-1]
To add elements in a list you can use the method append() and to remove them the method remove()
In [13]:
list1 = ['student', 'teacher', 1997, 2000]
list1.append(3)
print list1
list1.remove('teacher')
print list1
Other useful functions are:
len(list): Gives the number of elements in a list.
max(list): Returns item from the list with max value.
min(list): Returns item from the list with min value.
In [7]:
list2 = [1, 2, 3, 4, 5 ]
print len(list2)
print max(list2)
print min(list2)
As in other programming languages, python offers mechanisms to loop through a piece of code several times, or for conditionally executing a code fragment when certain conditions are satisfied.
For conditional execution, you can we use the 'if', 'elif' and 'else' statements.
Try to play with the following example:
In [9]:
x = int(raw_input("Please enter an integer: "))
if x < 0:
x = 0
print 'Negative changed to zero'
elif x == 0:
print 'Zero'
elif x == 1:
print 'Single'
else:
print 'More'
The above fragment, allows us also to discuss some important characteristics of the Python language syntaxis:
Unlike other languages, Python does not require to use the 'end' keyword to indicate that a given code fragment finishes. Instead, Python recurs to indentation
Indentation in Python is mandatory, and consists of 4 spaces (for first level indentation)
The condition lines conclude with ':', which are then followed by the indented blocks that will be executed only when the indicated conditions are satisfied.
The statement 'for' lets you iterate over the items of any sequence (a list or a string), in the order that they appear in the sequence
In [24]:
words = ['cat', 'window', 'open-course']
for w in words:
print w, len(w)
In combination with enumerate(), you can iterate over the elementes of the sequeence and have a counter over them
In [26]:
words = ['cat', 'window', 'open-course']
for (i, w) in enumerate(words):
print 'element ' + str(i) + ' is ' + w
First of all, you need to open a file with the open() function (if it does not exist, it creates it).
In [38]:
f = open('workfile', 'w')
The first argument is a string containing the filename. The second argument defines the mode in which the file will be used:
'r' : only to be read,
'w' : for only writing (an existing file with the same name would be erased),
'a' : the file is opened for appending; any data written to the file is automatically appended to the end.
'r+': opens the file for both reading and writing.
If the mode argument is not included, 'r' will be assumed.
Use f.write(string) to write the contents of a string to the file. When you are done, do not forget to close the file:
In [39]:
f.write('This is a test\n with 2 lines')
f.close()
To read the content of a file, use the function f.read():
In [42]:
f2 = open('workfile', 'r')
text=f2.read()
f2.close()
print text
You can also read line by line from the file identifier
In [44]:
f2 = open('workfile', 'r')
for line in f2:
print line
f2.close()
Python lets you define modules which are files consisting of Python code. A module can define functions, classes and variables.
Most Python distributions already include the most popular modules with predefined libraries which make our programmer lifes easier. Some well-known libraries are: time, sys, os, numpy, ...
There are several ways to import a library:
1) Import all the contents of the library: import lib_name
Note: You have to call these methods as part of the library
In [4]:
import time
print time.time() # returns the current processor time in seconds
time.sleep(2) # suspends execution for the given number of seconds
print time.time() # returns the current processor time in seconds again!!!
2) Define a short name to use the library: import lib_name as lib
In [6]:
import time as t
print t.time()
3) Import only some elements of the library
Note: now you have to use the methods directly
In [2]:
from time import time, sleep
print time()