Oregon Curriculum Network
Discovering Math with Python

Chapter 1: WELCOME TO PYTHON

Have a seat, relax. Enjoy learning new things.

Lets quote from the Python website, why not, about what Python itself actually is:

What is Python? Executive Summary

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.

Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance.

Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

Who uses Python?

Lets check some testimonials.

Is Python already installed on your computer? If so, where does it live?


In [1]:
# what do they mean by "high-level built in data structures"?  
# Lets see (notice how we use the pound sign to do comments)

# square brackets for a list (like an array)
zoo = ["monkey", "bear", "otter"]  

# curly braces with key:value pairs, for a dict (lookup table)
ages = {"monkey":3, "bear":2.5, "otter":1.5} 

# curly braces with elements for a set
tools = {"hammer", "screwdriver", "drill", "wrench"}

Think of data structures as like tools. You'll use them to store and retrieve values, to implement more complicated procedures, or algorithms. Data structures hold data.

As a preview of how we might get that data in the first place, lets skip ahead and start using the Python library.

Actually, lets start outside the Standard Library using a 3rd party tool, pandas. Here's a link to a well-known book: Python for Data Analysis by William McKinney, who started pandas. You can also watch him on Youtube.

Python ships with a Standard Library if it's a full blown Python, consisting of "namespaces" you may take for granted are just one import statement away.


In [2]:
import math
hypotenuse = math.hypot(3, 4)  # try your own numbers here!
hypotenuse


Out[2]:
5.0

In [3]:
import binascii
in_hex = binascii.hexlify(b"i like eating python as much as i like coding in python")
in_hex  # show ascii bytes in terms of the underlying hex codes


Out[3]:
b'69206c696b6520656174696e6720707974686f6e206173206d7563682061732069206c696b6520636f64696e6720696e20707974686f6e'

However a lot of what makes Python so great are the modules and packages we might get from the Python repository (aka "Cheese shop"), more formally known as PyPI, the Python Package Index.

What you don't find in the Standard Library, you may, in most cases, add to your Python path using pip3 install.

If you've used Linux, then you might want to think of pip3 install as the apt-get of the Python universe.

However that's not the end of the story. Python distributions such as Anaconda provide their own way of updating and upgrading. You might also find yourself using Git.

Lets skip ahead to where you've already downloaded pandas, enhancing your Python ecosystem with this powerful free tool.


In [4]:
import pandas as pd
url = "https://raw.githubusercontent.com/dariusk/corpora/master/data/animals/dinosaurs.json"
df = pd.read_json(url)

In [5]:
df.head()  # just show the first five lines of a much taller table


Out[5]:
description dinosaurs
0 A list of dinosaurs. Kangnasaurus
1 A list of dinosaurs. Lophostropheus
2 A list of dinosaurs. Spinophorosaurus
3 A list of dinosaurs. Epachthosaurus
4 A list of dinosaurs. Coelurosauria

What happened just there? Pandas is all about Dataframe objects, which McKinney is hoping will be the basis of a more generalized object that works across computer languages, such as R, Python, and those using the Java Virtual Machine (JVM).

We just created a Dataframe object by reading in data over the web, from a public stash of dinosaur names out there in the cloud, on Github. The first column isn't really adding any value though. We know it's a list of dinosaurs, no need to say that over and over. A first step after harvesting raw data is usually cleaning and/or massaging it into the shape we need. Lets drop that "description" column...


In [6]:
df.drop('description', axis=1, inplace=True)  # you won't be able to run this twice, why?
df.head()


Out[6]:
dinosaurs
0 Kangnasaurus
1 Lophostropheus
2 Spinophorosaurus
3 Epachthosaurus
4 Coelurosauria

Now that our dataframe is this simple, might we convert it to a native list, the data structure we started out with, with the square brackets? Sure we might.


In [7]:
dinos = df["dinosaurs"].tolist()  # yep, it's that easy

In [8]:
dinos[10:20]  # this is called "slicing", getting items 10 to 19


Out[8]:
['Qiaowanlong',
 'Rhynchosaur',
 'Ningyuansaurus',
 'Palaeolimnornis',
 'Anabisetia',
 'Talarurus',
 'Sphenodontia',
 'Tianyulong',
 'Aepisaurus',
 'Neuquenraptor']

In [9]:
len(dinos)  # how many dinosaurs are we talking about actually?


Out[9]:
1449

Wow.

Yes, you could start using this list to harvest pictures, for example, maybe starting with the some Dinosaur Database.


In [10]:
dinos.sort() # I notice these are not alphabetized.  We might sort in place.

In [11]:
dinos[-10:]  # now lets look from 10th from the end, to the end


Out[11]:
['Zhejiangosaurus',
 'Zhongyuansaurus',
 'Zhuchengceratops',
 'Zhuchengosaurus',
 'Zhuchengtyrannus',
 'Zigongosaurus',
 'Zizhongosaurus',
 'Zuniceratops',
 'Zuolong',
 'Zupaysaurus']

Slice notation is important because it's used with numpy arrays as well, in addition to pandas DataFrames and ordinary Python lists. Numpy arrays are like Python lists on steriods, meaning they have enhanced capabilities and multiple dimensions.

Lets go back to an ordinary list and see test this feature more.


In [12]:
zoo[1:]  # all but 0th element (addressing begins with 0)


Out[12]:
['bear', 'otter']

In [13]:
zoo[-1]  # last item in the list


Out[13]:
'otter'

Lets end this section with a quick look at a numpy array. Python lists may be "heterogenous" meaning their elements may be of many different types. Numpy needs its arrays to have all elements the same type, whatever type that may be (floats, ints, complex numbers are all typical).


In [14]:
import numpy as np  # notice how we rename the module as we import it
test_data = np.random.randint(1, 100, size=(5, 5))  # all integers, in a 5x5 matrix

In [15]:
test_data


Out[15]:
array([[41, 40, 75, 45, 18],
       [80, 51, 14, 44, 62],
       [88, 33, 33, 65, 15],
       [61, 60, 78,  9, 64],
       [36, 65, 34, 37,  7]])

In [16]:
test_data ** 2  # we can raise all these numbers to a 2nd power in one line!


Out[16]:
array([[1681, 1600, 5625, 2025,  324],
       [6400, 2601,  196, 1936, 3844],
       [7744, 1089, 1089, 4225,  225],
       [3721, 3600, 6084,   81, 4096],
       [1296, 4225, 1156, 1369,   49]])

Using numpy arrays can save a lot of looping code. As an example of that, you might want to eyeball this Notebook on making Fractals (the Mandelbrot Set in particular).

Versions of Python

Python is still advancing and any given Notebook or tutorial will tend to use a recent Python for its time. When I first wrote these Notebooks, Python 3.6 was the default kernel.

Will there be a Python 4? Lets listen to Guido:

What are the current expectations for Python 4.0?

My current expectation is that Python 4.0 will merely be "the release that comes after Python 3.9". That's it. No profound changes to the language, no major backwards compatibility breaks - going from Python 3.9 to 4.0 should be as uneventful as going from Python 3.3 to 3.4 (or from 2.6 to 2.7). I even expect the stable Application Binary Interface (as first defined in PEP 384) to be preserved across the boundary.

Here's my picture of Guido (middle) at Pycon2017, held in Portland, my city of residence at that time. -- Kirby

Continue to Chapter 2: Functions At Work
Introduction