Textual Analysis in Python (for DHers, etc).

Author: A. Sean Pue

A lesson for AL340 Digital Humanities Seminar (Spring 2015)

Michigan State University

Welcome!

This is an IPython Notebook. It contains numerous cells that can be of different types (Markdown, Code, Headings).

Your turn

Select the cell above this one by clicking it with your mouse.

You can see that it contains text in a format called Markdown. To execute the cell, press shift+enter. To complete this tutorial (which is meant for a classroom), execute the cells and follow the instructions.

Your turn

The cell below this one is producing an error. To fix it, change the cell type to Markdown (from Code) and then execute it by pressing shift+enter.


In [433]:
I am really *Markdown* but I am set as Code. Fix me!


  File "<ipython-input-433-8b56114245de>", line 1
    I am really *Markdown* but I am set as Code. Fix me!
       ^
SyntaxError: invalid syntax

Python in Fifteen Minutes

Why?

  • Python is easy to learn
  • Python is easy to read
  • Python has lots of great tools to do things quickly

How?

This will be a very quick introduction to Python.

We will focus on:

  • Setting variables
  • Writing comments
  • Python and spaces
  • Outputing
  • Functions and Types
  • Basic math
  • Basic text manipulation
  • Some Data Structures (Lists, Sets, and Dictionaries)
  • Getting help
  • Writing Functions
  • Testing and Checking

Setting Variables

A variable is basically a labeled box that you put things in. To set a variable in Python, you use the equal sign. Let's try.


In [502]:
X = 1
Y=2

Writing Comments

You can make notes to yourself and others in Python by starting or ending a line with #.


In [503]:
# Here is an example
X = 1
X=1  # This line is considered less readable than the above.

Python and Spaces

One of the features of Python that makes it so readable is that it cares very deeply about the space at the beginning of lines (indentation), so make sure you don't have a space at the beginning of the line. Otherwise, you will get an error.

Your Turn

An IndentationError occurs below due to an extra space at the beginning. Fix it.


In [504]:
X = 1 #This is okay
# An IndentationError here due to the space
 X = 2


  File "<ipython-input-504-af2283a44348>", line 3
    X = 2
    ^
IndentationError: unexpected indent

Outputing

There are different ways to output your variables and so forth. The IPython Notebook will output a variable or answer to an equation if it is the last command that is executing in the particular cell.


In [505]:
# Here, nothing is displayed because there is just a variable set.
X=1

In [506]:
# Here X is the last declaration in the cell, so X is displayed.
X=1
X


Out[506]:
1

In [507]:
# Here we have a declaration without a variable.
1


Out[507]:
1

You can also use the command print followed by a variable or declaration.


In [508]:
X=1
print 1
print X


1
1

You can print multiple variables or declarations together by using a comma.


In [509]:
print 1,1,X,X


1 1 1 1

There are also ways to output to files and so forth. You can also produce HTML within the notebook.

Your Turn

Fill in the following 3 cells with the code you need to do the commands listed in the comments.


In [510]:
# 1. In this cell, write the code to output the number 3. Do not use print

#YOURCODEHERE

In [511]:
# 2. In this cell, do the same using print.

#YOURCODEHERE

In [512]:
# 3. Do the following

# print the number 4 on its own line.

#YOURCODEHERE

# next, set X to 5

#YOURCODEHERE

# print X followed by 3

Basic Math

It's easy to do math in Python. The basic operators are +,-,/, and * for addition, subtraction, division, and multiplication.


In [513]:
1+1


Out[513]:
2

In [514]:
4/2


Out[514]:
2

In [515]:
5*2


Out[515]:
10

You can also use parentheses, e.g. ( ), to specify the order of operations.


In [516]:
(4-1)*3


Out[516]:
9

You can use += to add on to a number, too.


In [517]:
X=1
X+=1
X


Out[517]:
2

Your turn

Write code to answer the following: $777777 \times 666666 - 2$


In [518]:
# The answer you should get is 518517481480
  
#YOURCODEHERE

Now, subtract 666666 from 777777 and multiply by the result by two.


In [519]:
# The answer you should get is 222222.
# HINT: To do this on one line, you will need to use parentheses.

#YOURCODEHERE

Functions

A function is code that when called with a particular input gives a particular output. In math, this takes the form:

f(X) = 2 + X
f(2) = 4

Python uses a similar format.

Here let's use the built-in function abs() which gives the absolute value of a number (e.g. |-1|=1)


In [520]:
abs(-1)


Out[520]:
1

In [521]:
abs(2-1)


Out[521]:
1

You can specify multiple inputs. Let's try another built-in function called min, which gives the minimum value of the numbers based to it.


In [522]:
min(4,2,1)


Out[522]:
1

In [523]:
X=0
min(4,3,2,X)


Out[523]:
0

Note that you can also add to an existing number (or texts, as we will see) by using +=.


In [524]:
X = 1
X += 1
X


Out[524]:
2

Number Types

You can find out the type of a variable or declaration using the function type. Let's try it.


In [525]:
type(1)


Out[525]:
int

There are a number of basic numeric types, including int and float.

int is a whole number, positive or negative. To set it, you can do the following: x=1 or int(1).

float (for floating-point number) is a more precise number and includes a decimal. To set one, you can include a decimal point in your number, e.g. x=1.0, or use the function float(1).


In [526]:
type(1)


Out[526]:
int

In [527]:
type(1.0)


Out[527]:
float

In [528]:
type(2*3)


Out[528]:
int

In [529]:
type(2.0*3)


Out[529]:
float

The distinction between int and float can be a bit of gotcha, as Python will provide an int answer if one of the variables is not set as a float (by adding a decimal to the declaration).

Your turn

Adjust the following code to return .5:


In [530]:
# Fix below to return .5 (not 0). 
X = 1/2
print X


0

Testing for Equivalence

Python has two special words, True and False, to tell the veracity of something, and a special type bool to hold that value.


In [531]:
x = True
type(x)


Out[531]:
bool

You can test whether or not something equals something else using the operator ==, and to see if they are not equal using !=.


In [532]:
x = 1
x == 1


Out[532]:
True

In [533]:
x = 2
x != 1


Out[533]:
True

In [534]:
type(True)


Out[534]:
bool

Testing as You Go

You can check to make sure your expectations are correct using the command assert, which will through an error if your assertion is not correct.


In [535]:
x = 1
assert x == 1 # no problem here
assert x == 1.0 # no problem here, either
assert type(x)==float # it's an int not a float!


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-535-6afa4e9eee24> in <module>()
      2 assert x == 1 # no problem here
      3 assert x == 1.0 # no problem here, either
----> 4 assert type(x)==float # it's an int not a float!

AssertionError: 

Strings

Python is great at manipulation strings of text, and there is a special str type. To designate a string, type text in single or double quotation marks.


In [536]:
s = 'Hello'
t = "there"
type(s)


Out[536]:
str

Strings can be combined using +:


In [537]:
s + ' ' + t+ ' what\'s your name?' # you can use \' to put a ' inside ''s


Out[537]:
"Hello there what's your name?"

To get the length of a string, use the function len.


In [538]:
len('Hello')


Out[538]:
5

Strings also have numerous functions available to them. To use them, type a string variable followed by a period and then the function.


In [541]:
s = 'Hello'
print s.capitalize() # capitalizes the first letter
print s.upper()      # capitalizes all letters
print s.lower()     # lowercases all letters


Hello
HELLO
hello

In [542]:
a = 'X'.lower() # makes the string 'X' lowercase
b = 'x'.upper() # makes the strin g'x' uppercase
print (a + b).upper() # joins a and b and turns them uppercase
print (a + b).lower().upper() # does the above but then turns it uppercase


XX
XX

You can turn a number into a string, and a string into a number as follows.


In [543]:
s='1'
i=int(s)
f=float(s)
print s,i,f


1 1 1.0

Other Data Structures: Lists, Sets, and Dictionaries

Python has a number of other data structures besides the ones we have learned (int,float,str,bool)

Python is not "strongly typed" which means the type of your variable can changes, as in the following:


In [544]:
x = 1
print x,type(x)
x = 1.0
print x,type(x)
x = '1'
print x,type(x)
x = False
print x,type(x)


1 <type 'int'>
1.0 <type 'float'>
1 <type 'str'>
False <type 'bool'>

Lists

A list is a collection of different variables. The format is: [item1,item2,item3] The items can be of different types. The first element of a list is at [0]. As with str (strings) you can get a length using len.


In [545]:
my_list = ['A',1,2,3,'B'] 
print my_list
print 'the first item is',my_list[0]
print 'length of my_list is',len(my_list)


['A', 1, 2, 3, 'B']
the first item is A
length of my_list is 5

You can also select from the end of a list using a negative number, e.g. l[-1], and you can select a range of items using a colon, e.g. l[0:2]


In [547]:
l = ['0',1,2,3,'5']
print 'the first two items are',l[0:2] # l[start_at,end_before]
print 'The last item is',l[-1] # use negative numbers to read from end
print 'The first to next-to-last items are',l[0:-1]


the first two items are ['0', 1]
The last item is 5
The first to next-to-last items are ['0', 1, 2, 3]

Your turn

Using the following list, select ['B','C','D']


In [549]:
l = ['A','B','C','D','E','F']

# YOURCODEHERE

To append to a list use the .append command or +=


In [550]:
l=['A','B','C','D','E','F']
l+='G'
print l
l.append('H')
print l


['A', 'B', 'C', 'D', 'E', 'F', 'G']
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

Sets

A set is like a list but it does not have any duplicates. To define one, use the command set.


In [569]:
l = ['A','A','A','B']
print l,type(l)
s = set(l)
print s,set(s)


['A', 'A', 'A', 'B'] <type 'list'>
set(['A', 'B']) set(['A', 'B'])

Dictionary

A dictionary (dict)is a structure that allows you to use a key. To define one, you can use curly brackets, e.g. {'hi':'hindi','en':'urdu'}, and then access or set individual items using square brackets


In [570]:
langs = {}
print langs, type(langs)
langs['hi'] = 'Hindi'
print langs, len(langs)  # you can get the length, too

# Here is a way to set it all at once

langs = {'hi': 'Hindi', 'en': 'English', 'myfavnumber': 7}
print len(langs),langs['myfavnumber']


{} <type 'dict'>
{'hi': 'Hindi'} 1
3 7

To get the keys of a dictionary, use the funciton .keys, e.g. langs.keys()


In [571]:
print langs.keys()


['hi', 'en', 'myfavnumber']

Iterating (through lists, etc.)

Iterating moves going one-my-one through something.

To go through every element in a list, use the for command as below.


In [572]:
colors = ['red','white','blue','green']
for x in colors:
    print x


red
white
blue
green

Your Turn

Write code below to add 'purple' to the list of colors and then print out 'I like ', e.g. 'I like blue'.


In [573]:
colors = ['red','white','blue','green']

# YOURCODEHERE

Writing Functions

Whenever you have a task that you need to repeat, it is usually worthwhile to make a function. As mentioned above, a function is code that takes an input and returns an output.

In Python, you define a Python using the following pattern:

def my_function(): # put your input inside the ()s
    # your code here
    return # your output here output

Here is an example:

def add_one(x)
    x = x + 1
    return x

You can have multiple inputs, and the output can be whatever form you want, too. Spacing is important, as you need a standard indentation (usually 4 or 2 spaces) after the definition. That makes it easier to read.

Below is an example.


In [574]:
def add_two(x):
    return x+2

def add_three(x):
    o = x+2
    return o

def add_five(x):
    assert x
    return add_two(add_three(x))

y = 0
y = add_two(y)
print y
y = add_three(y)
print y
y = add_five(y)
print y


2
4
8

Your turn

Write a function named quote_me to add an exclamation mark to a string.


In [577]:
# So quote_me('Hello') should output: Hello!
def quote_me(s):
    # YOURCODEHERE
    return #YOURCODEHERETOO

Importing Libraries

To get extended features from Python, you need to import libraries. You need to install those libraries, too. You can use the command import. Below we will import the library sys.


In [580]:
import sys

In [ ]: