Some Python and Jupyter Basics


Greg Schivley


With material taken from the Whirlwind Tour of Python

This session is going to cover some basics of Python and Jupyter notebooks that I have found useful or wish I learned sooner.

I assume you know some programming, so I won't specifically cover loops, logic statements, or functions.

Stop me with questions as we go.

Using conda to manage Python

Python is an open source language with lots of awesome libraries and packages. But you have to install and manage them yourself.

The Anaconda distribution comes with most of what you need for scientific computing, but you might find others or need to update packages.

pip is the default python package manager, and is used to install packages from PyPI. Some packages are only available on PyPI, so you'll need to use pip to get them.

Unless you have to use pip, I recommend sticking with conda to manage packages.

Each can't manage packages installed by the other, so it's easy to end up with duplicate versions.

Using conda in terminal

conda install ... conda update ... conda remove ...

Unfortunately pip has different syntax

It's also worth reading this blog that talks about the history and differences between conda and pip.

Jupyter notebooks

Quick Start Guide

  • Computer code
  • Rich text elements
  • Inline figures

Notebooks run in your browser

Strengths/weaknesses of notebooks

Pros

  • Great way to iteratively explore data or do analysis
  • Lets you document your work in a readable way as you go
  • Notebooks can be easily shared as HTML through nbviewer.jupyter.org and will render as HTML on GitHub
  • Entire textbooks have been written in the notebook format
  • Can accommodate non-Python kernels (R, Julia, Haskel, etc) or run other languages in a Python notebook.

Cons

  • Notebooks can get really long
  • Easy to be lazy and copy code rather than creating functions
  • Might lose some functionality from your favorite IDE
  • Have to keep your browser open

How to use Jupyter notebooks

This is just to get you started.

Read the Jupyter documentation (Quick Start or Main) for more information.

Open a command prompt or terminal window and navigate to the parent folder where you are working

Type jupyter notebook to launch the notebook server

Helpful keyboard shortcuts

  • esc exits out of edit mode (green box) and puts the cell in command mode (blue box)
    • Most keyboard commands are run when in command mode
  • shift-enter runs a cell and moves to the next cell
  • m changes a cell to markdown, 1-6 makes it markdown as a section header
  • dd deletes a cell
  • c copies a cell, v pastes it below the currently selected cell

Python Variables Are Pointers

Assigning variables in Python is as easy as putting a variable name to the left of the equals (=) sign.


In [1]:
x = 4
print x, type(x)


4 <type 'int'>

In [3]:
x = 'hello'
print x, type(x)


hello <type 'str'>

In [4]:
x = 1         # x is an integer
x = 'hello'   # now x is a string
x = [1, 2, 3] # now x is a list
print x
print type(x)
print len(x)


[1, 2, 3]
<type 'list'>
3

Python variables are pointers

Two variables can point to the same object. Be careful, as this can lead to unanticipated consequences!


In [5]:
x = [1, 2, 3]
y = x
print y


[1, 2, 3]

In [6]:
x.append(4)
print y


[1, 2, 3, 4]

Python variables are pointers

Fortunately, simple objects are immutable - you can't change the value of "10" in memory, but you can point a variable to a different value


In [7]:
x = 10
y = x
# add 5 to x's value, and assign it to x
x += 5  
print "x =", x
print "y =", y


x = 15
y = 10

Off-topic note

Python 3 does away with int/float division issues. You can add this behavior to Python 2.


In [10]:
# 3/2
3/2.


Out[10]:
1.5

In [13]:
from __future__ import division
3/2


Out[13]:
1.5

Types of objects (data structures)

Common basic Python objects include:

  • Simple objects like ints, floats, strings, etc
  • Lists
  • Sets
  • Dictionaries
  • Tuples
Type Example Description
list [1, 2, 3] Ordered collection
tuple (1, 2, 3) Immutable ordered collection
dict {'a':1, 'b':2, 'c':3} Unordered (key,value) mapping
set {1, 2, 3} Unordered collection of unique values

Lists

Some basic list operations


In [14]:
L = [2, 3, 5, 7] # Define a list
print L
print len(L)


[2, 3, 5, 7]
4

Addition concatenates lists.

Lists can be a combination of different object types, including other lists


In [18]:
# L = L + [13, 17, 19, ['a', 'b']]
print L


[2, 3, 5, 7, 13, 17, 19, ['a', 'b']]

List indexing and slicing

Python provides access to elements through indexing (single elements), and slicing (multiple elements).

Both are indicated by a square-bracket syntax.


In [19]:
L = [2, 3, 5, 7, 11]

In [20]:
L[0] # Python is 0 indexed


Out[20]:
2

In [24]:
L[:-2]


Out[24]:
[2, 3, 5]

In [25]:
L[0:3] # element 3 is not included [)


Out[25]:
[2, 3, 5]

A few quick examples of easy logic statements


In [26]:
print L


[2, 3, 5, 7, 11]

In [27]:
2 in L


Out[27]:
True

In [28]:
4 in L


Out[28]:
False

In [29]:
(2 in L) and (4 not in L)


Out[29]:
True

List comprehensions

Lists are great basic objects, and can be extremely fast if used correctly

Slower way to add values to a list


In [36]:
# range generates a list from 0 to n-1
# Behavior is different in Python 3
range(5)


Out[36]:
[0, 1, 2, 3, 4]

In [37]:
%%timeit
l = [] # Iniitalize an empty list
for value in range(500):
    l.append(value**2)


The slowest run took 19.06 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 150 µs per loop

Better way to generate a list

Don't do this for very complex calculations - gets hard to read


In [38]:
%%timeit
[value**2 for value in range(500)]


10000 loops, best of 3: 106 µs per loop

Using NumPy operations on an entire array is even faster


In [40]:
import numpy as np
np.array(range(5))


Out[40]:
array([0, 1, 2, 3, 4])

In [41]:
%%timeit
np.array(range(500))**2


10000 loops, best of 3: 28.3 µs per loop

Tuples

Tuples are in many ways similar to lists, but they are defined with parentheses rather than square brackets.

Or they can be defined without any brackets.


In [42]:
t = (1, 2, 3)

In [43]:
t = 1, 2, 3
print t
print t[0]


(1, 2, 3)
1

Tuples are immutable

Once defined, the object can't be changed.


In [44]:
t[1] = 4


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-44-141c76cb54a2> in <module>()
----> 1 t[1] = 4

TypeError: 'tuple' object does not support item assignment

Tuples are often used to pass groups of values

Pass a group of objects/values into or out of a function


In [45]:
def f(x):
    a = x**2
    b = x**3
    return a, b

f(2)


Out[45]:
(4, 8)

In [46]:
a, b = f(2)
print 'a =', a
print 'b =', b


a = 4
b = 8

I defined a function above. Functions start with def, end the first line with a colon, and use return if you are returning values

Dictionaries

Dictionaries are flexible mappings of keys to values. They are created via a comma-separated list of key:value pairs within curly braces.


In [47]:
numbers = {'one':1, 'two':2, 'three':3}
numbers


Out[47]:
{'one': 1, 'three': 3, 'two': 2}

In [48]:
numbers['two'] # Use a key to return the matching value


Out[48]:
2

In [49]:
numbers.keys() # Keys don't stay ordered


Out[49]:
['three', 'two', 'one']

Sets

Sets are collections of unique values. You can create a set of unique values from a list-like object and can use set operations (union, intersection, etc)


In [54]:
l = [1, 2, 1, 4, 5, 2]
set(l)


Out[54]:
{1, 2, 4, 5}

I said list-like objects, which points to an important feature of Python: types don't matter so long as they behave correctly (duck typing)


In [56]:
l_array = np.array(l) # make a numpy array from the list
l_array
# l_array.reshape((2,3))


Out[56]:
array([1, 2, 1, 4, 5, 2])

In [52]:
set(l_array)


Out[52]:
{1, 2, 4, 5}

Iterators

Iterators are objects with multiple values that can be looped through. Sometimes you can also do indirect iteration, which will provide a single value at a time.

It's helpful to know about the enumerate and zip functions when using iterators.

Enumerate

Return both the index and value for each item in the iterator


In [57]:
for idx, value in enumerate(['dog', 'cat', 'pig', 'sheep']):
    print idx, value


0 dog
1 cat
2 pig
3 sheep

Zip

Combine two or more sets of iterators into a list of tuples


In [61]:
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for lval, rval in zip(L, R):
    print (lval, rval)
# zip(L, R)[0]
# Doing the same thing with indexing
# for i in range(len(L)):
#     print (L[i], R[i])


Out[61]:
(2, 3)

Common scientific Python libraries

So far we've mostly covered objects in base Python. This is a brief list of some common libraries with incomplete descriptions.

Arrays and data

  • Pandas: A must-have for working with data. Provides DataFrames, which allow for column and index labels. Handles NANs like a boss. Built on NumPy arrays. Similar to R dataframes.

Machine learning and more statistics

  • Scikit-learn: Everything you need for small to medium size machine learning problems.
  • Statsmodels: A dedicated stats library that lets you use R-style formulas with Pandas dataframes.

Plotting

Check out this dramatic tour through plotting with several different libraries

  • Matplotlib: The most common plotting library in Python. Created to mimic MATLAB and just as ugly. But it is flexible and lets you customize everything object on a figure.
  • Seaborn: Make a defined set of difficult visualizations easy. Creates matplotlib figures, so you can customize how they look.
  • ggplot: I've never used it, so I can't comment on how it compares to the version in R