Tech Overview: Working with Python and Jupyter

Let's get the basics down first. Navagating Jupyter, running Python code, making sure everything is installed correctly, etc.

Outline

0. Installation Check

First let's make sure Python is installed. If you receive no errors in this section, then your computer is ready to run the materials for this course! If you do receive an error, read the message closely since it offers clues to resolving the issue.

Jupyter notebooks consist of cells. The cell we're reading now is formatted as a "Markdown" cell, and is used for text.

To run Python code we need cells that are formatted as "Code" cells. A code cell has a "In [ ]" to the left of the cell. Running code in a Jupyter notebook is relatively easy. Click on the cell you wish to run (a segment of code with a gray background) in order to highlight it. Then either click the "Play" button in the toolbar above the code window or press CTRL+RETURN on your keyboard.

Python Version

A quick check to see make sure that we are running Python 3. If the number "2" is printed below, it means you're running Python 2. Because we're using a shared server everyone should see Python 3. If you install Python on your own machine you can run this to see which version you're running.


In [ ]:
import sys
sys.version_info.major

Required Packages

If you installed Python through the Anaconda platform, then the packages below should already be installed on your hard drive and Python should be able to find them.


In [ ]:
import os
import string
import numpy
import matplotlib
import pandas
import sklearn
import scipy
import nltk

print("Success!")

Visualization

The code in the cell below is not Python but a direct instruction to the Jupyter Notebook. Any visualizations that we produce will appear within the notebook itself (as opposed to an external image processor).


In [ ]:
%pylab inline

NLP Models

In order to fully use the NLTK package for Natural Language Processing, we need to download a couple of language models that give Python extra instructions. For example, the 'punkt' model below tells Python how to break strings of text into individual words or sentences. Running this cell will require a stable internet connection and perhaps a little patience. If it completes successfully, then it will print the word "True" at the bottom.


In [ ]:
nltk_data = ["punkt", "words", "stopwords", "averaged_perceptron_tagger", "maxent_ne_chunker", 'wordnet']
nltk.download(nltk_data)

As a quick, opening toy example to see Python in action, let's find all the present participles used in Jane Austen's Pride and Prejudice. There is a plain text file containing this book in this folder. Part of the reason why people use Python to do work on human-language texts (natural language processing) is because it makes tasks like this relatively simple.


In [ ]:
# every line that starts with a hash is a comment
# the computer ignores these lines, they are meant to address a human reader
# here is some starter code to make sure everything is set up (don't worry about understanding everything here)
for line in open('../Data/Austen_PrideAndPrejudice.txt', encoding='utf-8'):
    for word in line.split():
        if word.endswith('ing'):
            print(word)

Code Space

This is a free space to write your own code. Once you have written your masterpiece, go ahead and run it! To create new cells, click the "Plus" button in the toolbar.


In [ ]:

1. Primer in Markdown

If you double click on a markdown cell you will see the syntax behind the cell, and you can modify the cell.

Here are some basic formatting tags you might use in markdown. This is borrowed from here, which has more formatting tips. Google is also your friend, if you have a particular question about formatting in Markdown.

1. Headers

H1

H2

H3

H4

H5
H6

Alternatively, for H1 and H2, an underline-ish style:

Alt-H1

Alt-H2

2. Emphasis

Emphasis, aka italics, with asterisks or underscores.

Strong emphasis, aka bold, with asterisks or underscores.

Combined emphasis with asterisks and underscores.

Strikethrough uses two tildes. Scratch this.

3. Lists

  1. First ordered list item
  2. Another item
    • Unordered sub-list.
  3. Actual numbers don't matter, just that it's a number
    1. Ordered sub-list
  4. And another item.

    You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we'll use three here to also align the raw Markdown).

    To have a line break without a paragraph, you will need to use two trailing spaces.
    Note that this line is separate, but within the same paragraph.

  • Unordered list can use asterisks
  • Or minuses
  • Or pluses

5. Blockquotes

Blockquotes are very handy if you are including longer quores. This line is part of the same quote.

Quote break.

This is a very long line that will still be quoted properly when it wraps. Oh boy let's keep writing to make sure this is long enough to actually wrap for everyone. Oh, you can put Markdown into a blockquote.

6. Images

Let's end with a nice, relaxing picture, with text when you hover over it. You must give it a path name, either a file on your computer (in the images folder!), or linked online. Here's an online link:


In [ ]: