Python for Humanist (Part I)

This workshop is licensed under a Creative Commons Attribution 4.0 International License. Download Code & Data here: git repo

What is Python?

  • General programming language
  • Open and free environment
    • Lots of community support

Why use Python?

General programming language...good at most tasks

  • It's free
  • Common Uses
    • Web Scraping (Text & Data Mining)
    • Web Applications
    • Repetitive tasks & task automation
    • Transforming & Manipulating data
    • Row by Row

Python Setup

Installing Python

  • Download & install manually
    • Usually through command line (shell)
  • Part of IDE or package

Creating a Jupyter Notebook

  • Open the Anaconda navigator
  • Click 'launch' under Jupyter Notebook
  • Opens in your default web browser
  • Navigate to the location on your machine where you'd like the notebook to be saved
  • Click 'New'
    • Select 'Python 3' from the dropdown list

Use the Jupyter Notebook for editing and running Python.

  • While it's common to write Python scripts using a text editor, we are going to use the Jupyter Notebook for the remainder of this workshop.
  • This has several advantages:
    • You can easily type, edit, and copy and paste blocks of code.
    • Tab complete allows you to easily access the names of things you are using and learn more about them.
    • It allows you to annotate your code with links, different sized text, bullets, etc. to make it more accessible to you and your collaborators.
    • It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
  • Each notebook contains one or more cells that contain code, text, or images.

Running python code in jupyter

  • Code vs Text
    • Jupyter mixes code and text in different types of blocks, called cells.
  • Markdown can be used to style Text cells
  • Executable code is written in Code cells
  • CTRL + Enter will run a cell; Shift + Enter will run a cell and highlight the next cell.
    • Running a Text/Markdown cell will render the markdown
    • Running a Code cell will execute the python code

In [ ]:

Variables & Data Types

Use variables to store values.

  • Variables are names for values.
  • In Python the = symbol assigns the value on the right to the name on the left.
  • The variable is created when a value is assigned to it.
  • Here, Python assigns an age to a variable age and a name in quotes to a variable first_name.

In [ ]:

  • Variable names
    • can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
    • cannot start with a digit
  • Variable names that start with underscores like __alistairs_real_age have a special meaning so we won't do that until we understand the convention.

Use print to display values.

  • Python has a built-in function called print that prints things as text.
  • Call the function (i.e., tell Python to run it) by using its name.
  • Provide values to the function (i.e., the things to print) in parentheses.
  • To add a string to the printout, wrap the string in single or double quotes.
  • The values passed to the function are called 'arguments'

In [ ]:

  • print automatically puts a single space between items to separate them.
  • And wraps around to a new line at the end.

Variables must be created before they are used.

  • If a variable doesn't exist yet, or if the name has been mis-spelled, Python reports an error.
    • Unlike some languages, which "guess" a default value.

In [ ]:
print(last_name)

Variables can be used in calculations.

  • We can use variables in calculations just as if they were values.
    • Remember, we assigned 42 to age a few lines ago.

In [ ]:

Exercise 1

  1. What is the value of the variables x, y, and swap at each line of this code:
    x = 1.0    #line 1
    y = 3.0    #line 2
    swap = x   #line 3
    x = y      #line 4
    y = swap   #line 5
    

In [ ]:
# Exercise 1
# line 1    x = ?     y = ?     swap = ?
# line 2    x = ?     y = ?     swap = ?
# line 3    x = ?     y = ?     swap = ?
# line 4    x = ?     y = ?     swap = ?
# line 5    x = ?     y = ?     swap = ?

Every value has a type.

  • Every value in a program has a specific type.
  • Integer (int): represents positive or negative whole numbers like 3 or -512.
  • Floating point number (float): represents real numbers like 3.14159 or -2.5.
  • Character string (usually called "string", str): text.
    • Written in either single quotes or double quotes (as long as they match).
    • The quote marks aren't printed when the string is displayed.

Use the built-in function type to find the type of a value.

  • Use the built-in function type to find out what type a value has.
  • Works on variables as well.
    • But remember: the value has the type --- the variable is just a label.

In [ ]:
print()

In [ ]:
height = 'average'
print()

Types control what operations (or methods) can be performed on a given value.

  • A value's type determines what the program can do to it.

In [ ]:
print()

In [ ]:
print()

You can use the "+" and "*" operators on strings.

  • "Adding" character strings concatenates them.

In [ ]:

  • Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.
    • Since multiplication is repeated addition.

In [ ]:
separator = '=' * 10
print(separator)

Strings have a length (but numbers don't).

  • The built-in function len counts the number of characters in a string.

In [ ]:
print()
  • But numbers don't have a length (not even zero).

In [ ]:
print()

Must convert numbers to strings or vice versa when operating on them.

  • Cannot add numbers and strings.

In [ ]:
print()
  • Not allowed because it's ambiguous: should 1 + '2' be 3 or '12'?
  • Some types can be converted to other types by using the type name as a function.

In [ ]:
print()
print()

Can mix integers and floats freely in operations.

  • Integers and floating-point numbers can be mixed in arithmetic.
    • Python 3 automatically converts integers to floats as needed. (Integer division in Python 2 will return an integer, the floor of the division.)

In [ ]:
print()
print()

Variables only change value when something is assigned to them.

  • If we make one cell in a spreadsheet depend on another, and update the latter, the former updates automatically.
  • This does not happen in programming languages.

In [ ]:
print()
  • The computer reads the value of first when doing the multiplication, creates a new value, and assigns it to second.
  • After that, second does not remember where it came from.

Exercise 2

  • What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem.
    • For example, in # 1, when would counting days with a floating point variable make more sense than using an integer?

1) Number of days since the start of the year.

2) Time elapsed since the start of the year.

3) Call number of a book.

4) Standard book loan period.

5) Number of reference queries in a year.

6) Average library classes taught per semester.


In [ ]:
# Exercise 2

Built-In Functions & Libraries

Use comments to add documentation to programs.


In [ ]:
# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.
# print(adjustment)

A function may take zero or more arguments.

  • We have seen some functions already --- now let's take a closer look.
  • An argument is a value passed into a function.
  • len takes exactly one.
  • int, str, and float create a new value from an existing one.
  • print takes zero or more.
  • print with no arguments prints a blank line.
    • Must always use parentheses, even if they're empty, so that Python knows a function is being called.

In [ ]:
print()
print()
print()

Commonly-used built-in functions include max, min, and round.

  • Use max to find the largest value of two or more values.
  • Use min to find the smallest.
  • Both work on character strings as well as numbers.
    • "Larger" and "smaller" use (0-9, A-Z, a-z) to compare letters.

In [ ]:
print()
print()

Functions may only work for certain (combinations of) arguments.

  • max and min must be given at least one argument.
    • "Largest of the empty set" is a meaningless question.
  • And they must be given things that can meaningfully be compared.

In [ ]:
print()

Functions may have default values for some arguments.

  • round will round off a floating-point number.
  • By default, rounds to zero decimal places.

In [ ]:
print()
  • We can specify the number of decimal places we want.

In [ ]:
print()

Use the built-in function help to get help for a function.

  • Every built-in function has online documentation.

In [ ]:
print()

Python objects also have built-in methods

  • Methods are similar to functions except they are called on an object.
  • Methods are depentdent on the object type.
    • str objects in python have many built-in methods including startswith(), title(), and replace().

In [ ]:
text = 'Hello there, nice to meet you!'
print(text)
print()
print()
print()

Most of the power of a programming language is in its libraries.

  • A library is a collection of files (called modules) that contains functions for use by other programs.
    • May also contain data values (e.g., numerical constants) and other things.
    • Library's contents are supposed to be related, but there's no way to enforce that.
  • The Python standard library is an extensive suite of modules that comes with Python itself.
  • Many additional libraries are available from [PyPI][pypi] (the Python Package Index).
  • We will see later how to write new libraries.

A program must import a library module before using it.

  • Use import to load a library module into a program's memory.
  • Then refer to things from the module as module_name.thing_name.
    • Python uses . to mean "part of".
  • Using datetime, one of the modules in the standard library:

In [ ]:

  • Have to refer to each item with the module's name.

Use help to learn about the contents of a library module.

  • Works just like help for a function.

In [ ]:
help(datetime)

Import specific items from a library module to shorten programs.

  • Use from ... import ... to load only specific items from a library module.
  • Then refer to them directly without library name as prefix.

In [ ]:

Create an alias for a library module when importing it to shorten programs.

  • Use import ... as ... to give a library a short alias while importing it.
  • Then refer to items in the library using that shortened name.

In [ ]:

  • Commonly used for libraries that are frequently used or have long names.
    • E.g., matplotlib plotting library is often aliased as plt.
  • But can make programs harder to understand, since readers must learn your program's aliases.

Lists

A list stores many values in a single structure.

  • Doing calculations with a hundred variables called name_001, name_002, etc., would be at least as slow as doing them by hand.
  • Use a list to store many values together.
    • Contained within square brackets [...].
    • Values separated by commas ,.
  • Use len to find out how many values are in a list.

In [ ]:
#index      0      1       2       3       4   
names = ['Cathy','Doug','Monica','Jake','Peter']
print(type(names))
print()
print()

Use an item's index to fetch it from a list.

  • An index in python starts counting from 0.

In [ ]:
print()
print()

Lists' values can be replaced by assigning to them.

  • Use an index expression on the left of assignment to replace a value.

In [ ]:

Appending items to a list lengthens it.

  • Use list_name.append to add items to the end of a list.

In [ ]:

  • append is a method of lists.
    • Like a function, but tied to a particular object.
  • Use object_name.method_name to call methods.
    • Deliberately resembles the way we refer to things in a library.
  • We will meet other methods of lists as we go along.
    • Use help(list) for a preview.
  • extend is similar to append, but it allows you to combine two lists. For example:

In [ ]:
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print(primes)

Note that while extend maintains the "flat" structure of the list, appending a list to a list makes the result two-dimensional.

Use del to remove items from a list entirely.

  • del list_name[index] removes an item from a list and shortens the list.
  • Not a function or a method, but a statement in the language.

In [ ]:
print(primes)
del primes[4]
print(primes)

The empty list contains no values.

  • Use [] on its own to represent a list that doesn't contain any values.
    • "The zero of lists."
  • Helpful as a starting point for collecting values

Lists may contain values of different types.

  • A single list may contain numbers, strings, and anything else.

In [ ]:
goals = []
print(goals)

Lists can be sliced

  • We can slice a list to obtain a sub-section of the list
    • Use the index numbers separated by a colon : to designate which slice of the list to take

In [ ]:
values = [1,3,4,7,9,13]
print()
print()
print()

Indexing beyond the end of the collection is an error.

  • Python reports an IndexError if we attempt to access a value that doesn't exist.
    • This is a kind of runtime error.
    • Cannot be detected as the code is parsed because the index might be calculated based on data.

In [ ]:
print()

Exercise 3

  • Fill in the blanks in teh code below to get the results in the 'Desired Output' section. Use the emoty cell below to test your choices.
values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)

Desired Output:

first time: [1, 3, 5]
second time: [3, 5]

In [ ]:
# Exercise 3

Dictionaries

A dictionary stores many values in a nested structure.

  • Dictionaries store data in key, value pairs
    • Contained within curly brackets{...}.
    • Key and value seperated by colon "key":"value".
    • Each value followed by a comma ,.
  • Nested structure allow for more complicated data relationships
  • Very common if you're working with web data
    • Dictionaries match JSON file structure


In [ ]:
students = {"firstName":"John","lastName":"Smith"}
print()
print()

Adding content to and updating a dictionary

  • You can add content to an existing dictionary
    • Add the desired key name in square brackets ["key"]
    • Set that equal to the desired value for that key ["key"] = "value"

In [ ]:

Access values by using the keys

  • We use the key names inside square brackets [' '] to access the value.
    • Nested data must be navigated from the top level key.
  • Lists inside of dictionaries are treated the same as a stand-alone list.

In [ ]:
print()
print()
print()

Dictionaries can contain many nested elements

  • Dictionaries can have multiple structured data elements.
  • It's useful to know the structure of your dictionary to access the values of nested elements.

In [ ]:
courses = {"courses":[
    {
        "Title":"Intro to Economics",
        "Instructor": {
            "firstName": "Robert",
            "lastName":"Schiller",
        },
        "Number": "ECON 101",
        "Size": 65,
        "isFull": True,
            },
    {
        "Title":"Intro to French",
        "Instructor": {
            "firstName": "Marie",
            "lastName": "Gribouille",
        },
        "Number":"FREN 101",
        "Size": 15,
        "isFull": False,       
           }
           ]}

print(courses)
  • Python starts from the top level element
    • In this example, the top level key is "courses".
    • "courses" is a list with two elements

In [ ]:
print()
print()

Going deeper

  • To access the values nested inside a dictionary, we must navigate to their level
  • We must step through each level of our nested data
    • 'courses' is the top level or root element
    • courses['courses'] is a list; it's items are accessed using the index
  • .keys() & .values() lists the dictionary keys and values respectively at the current dictionary level

In [ ]:
print()
print()

Exercise 4

  1. Use the cell below to print the last name of the 'Intro to French' instructor from our courses dictionary.

In [ ]:
# Exercise 4
print()