Overview

This lesson introduces Python as an environment for reproducible scientific data analysis and programming. The materials are based on the Software Carpentry Programming with Python lesson.

At the end of this lesson, you will be able to:

Read and write basic Python code;
Import and export tabular data with Python;
Subset and filter tabular data;
Understand different data types and data formats;
Understand pandas Data Frames and how they help organize tabular data;
Devise and intepret data processing workflows;
Automate your data cleaning and analysis with Python;
Visualize your data using matplotlib and pandas;
Connect to a SQLite database using Python.

This lesson will introduce Python as a general purpose programming language. Python is a great programming language to use for a wide variety of applications, including:

Natural language processing or text analysis;
Web development and web publishing;
Web scraping or other unstructured data mining;
Image processing;
Spatial data analysis;
(And many others.)

License

As with the Software Carpentry lesson, this lesson is licensed for open use under the CC BY 4.0 license.

Introduction to Python
- The Python Interpreter
- First Steps with Python
- Importing Libraries
- About the Data
- Arrays and their Attributes
- Getting Help
- More on Arrays
- Basic Data Visualization
Repeating Tasks with Loops
- Sequences
- More Complex Loops
- Iterators and Generators
Analyzing Data from Multiple Files
- Looping over Files
- Generating a Plot
- Putting it All Together
Conditional Evaluation
- Conditional Expressions in Python
- Checking our Data
Creating Functions for Reuse
- Composing Multiple Functions
- Cleaning Up our Analysis Code
- Positional versus Keyword Arguments
- Documenting Functions
Python at the Command Line
- Our First Python Script
- Alternative Command-Line Tools
- Modularization
Understanding and Handling Errors
Defensive Programming
- Assertions
- Test-Driven Development
- Testing for Quality Control
- Unit Testing
Analyzing and Optimizing Performance
- Benchmarking
Capstone
- Your Tasks
- Getting Started
Connecting to SQLite with Python

Introduction to Python

Python is a general purpose programming language that allows for the rapid development of scientific workflows. Python's main advantages are:

It is open-source software, supported by the Python Software Foundation;
It is available on all platforms, including Windows, Mac OS X, and GNU/Linux;
It can be used to program any kind of task (it is a general purpose language);
It supports multiple programming paradigms (a fancy term computer scientists use to describe the different ways people like to design software);
Most importantly, it has a large and diverse community of users who share Python code they've already written to do a wide variety of things.

The Python Interpreter

The only language that computers really understand is machine language, or binary: ones and zeros. Anything we tell computers to do has to be translated to binary for computers to execute.

Python is what we call an interpreted language. This means that computers can translate Python to machine code as they are reading it. This distinguishes Python from languages like C, C++, or Java, which have to be compiled to machine code before they are run. The details aren't important to us; what is important is that we can use Python in two ways:

We can use the Python interpreter in interactive mode;
Or, we can use execute Python code that is stored in a text file, called a script.

Jupyter Notebook

For this lesson, we'll be using the Python interpreter that is embedded in Jupyter Notebook. Jupyter Notebook is a fancy, browser-based environment for literate programming, the combination of Python scripts with rich text for telling a story about the task you set out to do with Python. This is a powerful way for collecting the code, the analysis, the context, and the results in a single place.

The Python interpreter we'll interact with in Jupyter Notebook is the same interpreter we could use from the command line. To launch Jupyter Notebook:

In GNU/Linux or Mac OS X, launch the Terminal and type: jupyter notebook; then press ENTER.
In Windows, launch the Command Prompt and type jupyter notebook; then press ENTER.

Let's try out the Python interpreter.



In [1]:

    
print('Hello, world!')









    



Hello, world!

Alternatively, we could save that one line of Python code to a text file with a *.py extension and then execute that file. We'll see that towards the end of this lesson.

First Steps with Python

In interactive mode, the Python interpreter does three things for us, in order:

Reads our input;
Evaluates or executes the input command, if it can;
Prints the output for us to see, then waits for the next input.

This is called a read, evaluate, print loop (REPL). Let's try it out.



In [2]:

    
5 * 11









    Out[2]:





55

We can use Python as a fancy calculator, like any programming language.

When we perform calculations with Python, or run any Python statement that produces output, if we don't explicitly save that output somewhere, then we can't access it again. Python prints the output to the screen, but it doesn't keep a record of the output.

In order to save the output of an arbitrary Python statement, we have to assign that output to a variable. We do this using the equal sign operator:



In [3]:

    
weight_kg = 5 * 11

Notice there is no output associated with running this command. That's because the output we saw earlier has instead been saved to the variable named number.

If we want to retrieve this output, we can ask Python for the value associated with the variable named number.



In [4]:

    
weight_kg









    Out[4]:





55

As we saw earlier, we can also use the function print() to explicitly print the value to the screen.



In [5]:

    
print('Weight in pounds:', 2.2 * weight_kg)









    



Weight in pounds: 121.00000000000001

A function like print() can take multiple arguments, or inputs to the function. In the example above, we've provided two arguments; two different things to print to the screen in a sequence.

We can also change a variable's assigned value.



In [6]:

    
weight_kg = 57.5
print('Weight in pounds:', 2.2 * weight_kg)









    



Weight in pounds: 126.50000000000001

If we imagine the variable as a sticky note with a name written on it, assignment is like putting the sticky note on a particular value. See this illustration.

This means that assigning a value to one variable does not change the values of other variables. For example:



In [7]:

    
weight_lb = 2.2 * weight_kg
weight_lb









    Out[7]:





126.50000000000001



In [8]:

    
weight_kg = 100.0
weight_lb









    Out[8]:





126.50000000000001

Since weight_lb doesn't depend on where its initial value came from, it isn't automatically updated when weight_kg changes. This is different from how, say, spreadsheets work.

Importing Libraries

What are some tasks you're hoping to complete with Python? Alternatively, what kinds of things have you done in other programming languages?

When you're thinking of starting a new computer-aided analysis or building a new software tool, there's always the possibility that someone else has created just the piece of software you need to get your job done faster. Because Python is a popular, general-purpose, and open-source programming language with a long history, there's a wealth of completed software tools out there written in the Python for you to use. Each of these software libraries extends the basic functionality of Python to let you do new and better things.

The Python Package Index (PyPI), is the place to start when you're looking for a piece of Python software to use. We'll talk about that later.

For now, we'll load a Python package that is already available on our systems. NumPy is a numerical computing library that allows us to both represent sophisticated data structures and perform calculations on them.



In [9]:

    
import numpy

Now that we've imported numpy, we have access to new tools and functions that we didn't have before. For instance, we can use numpy to read in tabular data for us to work with.



In [10]:

    
numpy.loadtxt('barrow.temperature.csv', delimiter = ',')









    Out[10]:





array([[ 245.66  ,  247.52  ,  245.28  ,  256.32  ,  262.9   ,  272.46  ,
         278.06  ,  275.51  ,  269.34  ,  261.68  ,  251.35  ,  242.52  ],
       [ 248.12  ,  242.64  ,  252.04  ,  248.61  ,  262.84  ,  271.93  ,
         277.45  ,  278.92  ,  274.4   ,  266.77  ,  258.69  ,  248.2   ],
       [ 252.3   ,  240.39  ,  248.66  ,  255.41  ,  265.84  ,  274.64  ,
         278.87  ,  278.26  ,  273.36  ,  265.77  ,  261.22  ,  248.8   ],
       [ 239.52  ,  242.16  ,  243.65  ,  256.21  ,  266.    ,  274.64  ,
         279.66  ,  279.17  ,  273.23  ,  264.89  ,  260.47  ,  249.39  ],
       [ 244.95  ,  241.02  ,  244.75  ,  251.51  ,  262.25  ,  272.85  ,
         278.78  ,  277.02  ,  270.84  ,  264.87  ,  259.14  ,  248.89  ],
       [ 244.04  ,  242.96  ,  245.17  ,  255.92  ,  265.94  ,  275.58  ,
         277.5   ,  275.5   ,  272.59  ,  261.36  ,  254.54  ,  245.16  ],
       [ 245.15  ,  241.35  ,  249.29  ,  255.02  ,  264.12  ,  273.83  ,
         277.92  ,  279.6   ,  273.92  ,  266.54  ,  256.88  ,  244.83  ],
       [ 248.89  ,  240.27  ,  245.6   ,  253.41  ,  264.44  ,  272.96  ,
         277.09  ,  274.26  ,  271.2   ,  260.74  ,  248.42  ,  246.11  ],
       [ 244.43  ,  240.49  ,  246.82  ,  255.39  ,  266.63  ,  272.9   ,
         277.25  ,  274.2   ,  272.48  ,  260.43  ,  249.28  ,  246.57  ],
       [ 252.79  ,  242.63  ,  247.48  ,  258.35  ,  266.03  ,  274.71  ,
         277.16  ,  278.55  ,  271.8   ,  265.78  ,  255.21  ,  243.86  ],
       [ 247.12  ,  245.83  ,  249.23  ,  255.7   ,  265.03  ,  274.97  ,
         277.51  ,  278.35  ,  273.96  ,  263.44  ,  255.64  ,  250.09  ],
       [ 245.59  ,  250.75  ,  240.12  ,  253.26  ,  265.89  ,  274.27  ,
         275.25  ,  275.97  ,  271.94  ,  263.74  ,  253.62  ,  243.95  ],
       [ 244.29  ,  249.98  ,  248.36  ,  252.28  ,  267.14  ,  273.45  ,
         276.63  ,  275.15  ,  269.89  ,  262.72  ,  251.06  ,  252.82  ],
       [ 250.32  ,  241.28  ,  243.16  ,  251.37  ,  267.16  ,  274.29  ,
         276.54  ,  275.49  ,  273.54  ,  262.95  ,  254.61  ,  244.49  ],
       [ 252.23  ,  251.42  ,  247.37  ,  254.33  ,  264.13  ,  274.27  ,
         277.91  ,  278.39  ,  271.66  ,  265.05  ,  256.49  ,  250.37  ],
       [ 251.01  ,  246.49  ,  244.48  ,  256.79  ,  268.48  ,  273.48  ,
         276.17  ,  275.96  ,  272.35  ,  262.42  ,  251.49  ,  253.72  ],
       [ 244.77  ,  241.01  ,  242.21  ,  250.23  ,  264.22  ,  273.3   ,
         276.61  ,  275.63  ,  271.4   ,  263.85  ,  254.92  ,  242.63  ],
       [ 244.62  ,  240.12  ,  251.65  ,  256.26  ,  264.17  ,  272.55  ,
         276.38  ,  275.97  ,  272.35  ,  260.26  ,  260.46  ,  247.79  ],
       [ 245.2   ,  243.45  ,  242.63  ,  252.33  ,  264.26  ,  274.66  ,
         276.67  ,  275.17  ,  273.01  ,  264.47  ,  257.25  ,  248.04  ],
       [ 248.1   ,  245.61  ,  249.87  ,  257.78  ,  266.9   ,  274.25  ,
         276.39  ,  275.76  ,  270.87  ,  262.69  ,  254.8   ,  250.74  ],
       [ 249.26  ,  242.05  ,  248.97  ,  252.76  ,  266.36  ,  273.65  ,
         278.7   ,  278.02  ,  272.49  ,  263.92  ,  253.8   ,  247.5   ],
       [ 246.89  ,  244.36  ,  248.17  ,  256.93  ,  267.02  ,  274.51  ,
         275.29  ,  273.09  ,  272.59  ,  265.6   ,  250.46  ,  252.78  ],
       [ 246.45  ,  247.03  ,  246.63  ,  254.01  ,  265.85  ,  273.49  ,
         276.41  ,  275.41  ,  269.49  ,  257.23  ,  254.26  ,  248.5   ],
       [ 242.13  ,  240.63  ,  243.75  ,  252.48  ,  264.86  ,  275.15  ,
         277.7   ,  274.59  ,  272.5   ,  262.79  ,  254.22  ,  248.6   ],
       [ 245.1   ,  243.38  ,  242.77  ,  251.69  ,  265.32  ,  273.57  ,
         278.84  ,  277.91  ,  271.99  ,  265.99  ,  255.61  ,  252.15  ],
       [ 246.13  ,  247.34  ,  243.83  ,  254.08  ,  266.04  ,  273.8   ,
         276.94  ,  276.58  ,  273.79  ,  264.65  ,  256.48  ,  251.51  ],
       [ 247.74  ,  239.06  ,  244.38  ,  252.04  ,  265.73  ,  272.47  ,
         276.98  ,  278.23  ,  273.87  ,  259.68  ,  251.41  ,  241.78  ],
       [ 241.6   ,  246.71  ,  249.93  ,  252.84  ,  266.02  ,  273.88  ,
         276.78  ,  275.41  ,  269.8   ,  259.93  ,  249.18  ,  243.94  ],
       [ 245.59  ,  241.88  ,  246.59  ,  254.65  ,  265.36  ,  274.01  ,
         276.99  ,  277.17  ,  273.7   ,  263.14  ,  257.85  ,  246.84  ],
       [ 252.33  ,  248.12  ,  242.86  ,  252.26  ,  266.26  ,  274.62  ,
         277.51  ,  279.74  ,  274.49  ,  265.64  ,  251.91  ,  249.33  ],
       [ 252.5   ,  247.59  ,  248.82  ,  255.9   ,  265.09  ,  273.42  ,
         277.74  ,  276.34  ,  274.01  ,  260.2   ,  258.06  ,  248.86  ],
       [ 252.85  ,  242.06  ,  246.28  ,  254.76  ,  266.71  ,  273.58  ,
         278.7   ,  279.6801,  274.62  ,  266.07  ,  260.85  ,  246.9   ],
       [ 246.82  ,  250.27  ,  248.81  ,  253.95  ,  265.82  ,  275.47  ,
         276.13  ,  274.99  ,  269.79  ,  264.34  ,  254.28  ,  245.07  ],
       [ 255.14  ,  245.79  ,  248.75  ,  255.22  ,  268.94  ,  275.16  ,
         278.04  ,  275.07  ,  270.55  ,  264.25  ,  254.67  ,  249.39  ],
       [ 247.41  ,  250.2   ,  247.26  ,  253.82  ,  264.28  ,  274.26  ,
         277.22  ,  276.23  ,  271.89  ,  259.64  ,  249.28  ,  248.69  ],
       [ 243.    ,  245.26  ,  245.31  ,  254.97  ,  263.21  ,  273.89  ,
         276.75  ,  275.42  ,  269.6   ,  258.88  ,  255.23  ,  253.84  ],
       [ 245.9   ,  235.7   ,  244.44  ,  247.95  ,  263.35  ,  274.92  ,
         277.67  ,  276.39  ,  273.82  ,  264.31  ,  249.53  ,  246.02  ],
       [ 250.66  ,  244.58  ,  246.8   ,  248.61  ,  266.59  ,  274.56  ,
         277.25  ,  276.27  ,  271.35  ,  260.75  ,  257.02  ,  251.27  ],
       [ 245.47  ,  251.1   ,  244.93  ,  250.6   ,  265.36  ,  274.82  ,
         279.27  ,  277.23  ,  275.31  ,  263.35  ,  254.18  ,  250.93  ],
       [ 248.06  ,  245.35  ,  247.77  ,  250.9   ,  266.2   ,  274.56  ,
         277.7   ,  277.75  ,  271.32  ,  267.57  ,  252.11  ,  249.45  ],
       [ 249.63  ,  246.68  ,  247.49  ,  255.31  ,  265.54  ,  274.02  ,
         277.73  ,  275.84  ,  271.62  ,  258.34  ,  248.56  ,  250.82  ],
       [ 241.62  ,  257.48  ,  247.96  ,  258.22  ,  265.39  ,  275.42  ,
         280.11  ,  280.18  ,  274.69  ,  264.81  ,  248.74  ,  250.75  ],
       [ 243.12  ,  240.    ,  249.57  ,  259.39  ,  269.73  ,  276.15  ,
         279.17  ,  276.97  ,  272.72  ,  264.47  ,  251.15  ,  246.73  ],
       [ 247.28  ,  243.6   ,  245.3   ,  256.1   ,  269.67  ,  276.63  ,
         277.5   ,  276.15  ,  272.6   ,  264.51  ,  250.08  ,  244.12  ],
       [ 244.31  ,  245.26  ,  251.27  ,  258.2   ,  269.23  ,  277.28  ,
         278.09  ,  274.28  ,  268.89  ,  259.18  ,  253.01  ,  249.2   ],
       [ 248.05  ,  248.52  ,  248.25  ,  257.31  ,  266.78  ,  276.48  ,
         281.76  ,  276.78  ,  272.97  ,  268.03  ,  257.33  ,  249.91  ],
       [ 248.14  ,  248.22  ,  244.39  ,  258.36  ,  269.12  ,  276.29  ,
         279.18  ,  276.61  ,  271.37  ,  257.02  ,  249.25  ,  246.96  ],
       [ 247.81  ,  246.38  ,  245.45  ,  258.4   ,  269.41  ,  276.27  ,
         279.36  ,  277.    ,  275.55  ,  262.83  ,  253.32  ,  248.82  ],
       [ 248.38  ,  245.65  ,  251.99  ,  254.02  ,  271.74  ,  277.23  ,
         277.65  ,  273.45  ,  271.1   ,  257.76  ,  256.4   ,  251.3   ],
       [ 244.91  ,  247.23  ,  246.02  ,  256.88  ,  266.29  ,  276.01  ,
         279.16  ,  279.34  ,  275.07  ,  264.47  ,  260.06  ,  247.69  ],
       [ 245.19  ,  245.54  ,  253.27  ,  261.44  ,  268.26  ,  276.95  ,
         281.2   ,  278.93  ,  275.8   ,  267.62  ,  259.2   ,  252.71  ],
       [ 243.85  ,  244.22  ,  245.52  ,  253.71  ,  270.49  ,  276.88  ,
         279.34  ,  278.36  ,  273.99  ,  264.52  ,  254.36  ,  245.63  ],
       [ 246.86  ,  248.67  ,  247.19  ,  254.44  ,  263.16  ,  276.81  ,
         278.43  ,  276.95  ,  272.99  ,  264.51  ,  256.15  ,  251.8   ],
       [ 250.87  ,  252.82  ,  247.19  ,  256.52  ,  262.62  ,  275.2   ,
         277.4   ,  276.    ,  273.79  ,  261.83  ,  255.56  ,  249.26  ],
       [ 247.36  ,  246.03  ,  252.86  ,  255.58  ,  268.74  ,  274.61  ,
         277.73  ,  276.2   ,  275.08  ,  268.29  ,  260.39  ,  254.52  ],
       [ 249.51  ,  248.38  ,  248.76  ,  259.16  ,  267.28  ,  275.32  ,
         278.29  ,  276.2   ,  273.09  ,  268.66  ,  257.18  ,  250.49  ],
       [ 248.81  ,  244.36  ,  246.86  ,  256.96  ,  268.68  ,  277.54  ,
         279.86  ,  279.97  ,  273.06  ,  267.07  ,  257.15  ,  250.49  ],
       [ 250.03  ,  247.77  ,  251.23  ,  256.23  ,  268.59  ,  275.15  ,
         277.72  ,  278.94  ,  274.42  ,  267.14  ,  253.75  ,  253.15  ],
       [ 246.16  ,  250.86  ,  245.87  ,  253.31  ,  268.51  ,  275.84  ,
         277.52  ,  276.19  ,  275.68  ,  269.29  ,  256.62  ,  253.05  ],
       [ 247.26  ,  248.67  ,  245.21  ,  259.71  ,  265.85  ,  275.7401,
         279.99  ,  279.82  ,  275.89  ,  266.91  ,  261.99  ,  253.9   ],
       [ 246.28  ,  246.21  ,  246.88  ,  258.33  ,  267.85  ,  275.64  ,
         277.85  ,  276.39  ,  274.12  ,  265.39  ,  256.85  ,  253.57  ],
       [ 247.09  ,  246.44  ,  246.22  ,  256.26  ,  268.96  ,  275.03  ,
         279.62  ,  277.92  ,  274.49  ,  268.55  ,  254.66  ,  253.4   ],
       [ 245.53  ,  249.51  ,  248.98  ,  258.93  ,  267.35  ,  274.91  ,
         278.52  ,  278.96  ,  275.55  ,  267.43  ,  260.76  ,  248.88  ],
       [ 249.82  ,  250.2   ,  250.58  ,  254.86  ,  267.95  ,  275.15  ,
         278.4   ,  277.98  ,  275.31  ,  267.61  ,  253.98  ,  250.52  ],
       [ 242.19  ,  248.96  ,  243.16  ,  257.6   ,  266.93  ,  275.86  ,
         279.16  ,  279.32  ,  274.23  ,  268.54  ,  256.51  ,  248.66  ],
       [ 250.17  ,  245.43  ,  249.71  ,  254.56  ,  266.73  ,  277.18  ,
         279.58  ,  277.69  ,  272.76  ,  269.18  ,  258.31  ,  252.66  ],
       [ 256.26  ,  249.03  ,  250.96  ,  258.16  ,  267.62  ,  273.48  ,
         276.79  ,  279.19  ,  273.85  ,  265.64  ,  260.18  ,  255.57  ],
       [ 248.85  ,  252.09  ,  248.18  ,  257.83  ,  271.14  ,  276.61  ,
         279.49  ,  276.57  ,  271.73  ,  267.68  ,  259.43  ,  250.4   ],
       [ 256.61  ,  254.16  ,  250.8   ,  261.01  ,  270.44  ,  276.61  ,
         279.02  ,  279.61  ,  273.73  ,  268.21  ,  256.45  ,  251.37  ]])

The expression numpy.loadtxt() is a function call that asks Python to run the function loadtxt() which belongs to the numpy library. Here, the word numpy is the namespace to which a function belongs. This dotted notation is used everywhere in Python to refer to the parts of things as thing.component.

Because the loadtxt() function and others belong to the numpy library, to access them we will always have to type numpy. in front of the function name. This can get tedious, especially in interactive mode, so Python allows us to come up with a new namespace as an alias.



In [11]:

    
import numpy as np

The np alias for the numpy library is a very common alias; so common, in fact, that you can get help for NumPy functions by looking up np and the function name in a search engine.

With this alias, the loadtxt() function is now called as:



In [12]:

    
np.loadtxt('barrow.temperature.csv', delimiter = ',')









    Out[12]:





array([[ 245.66  ,  247.52  ,  245.28  ,  256.32  ,  262.9   ,  272.46  ,
         278.06  ,  275.51  ,  269.34  ,  261.68  ,  251.35  ,  242.52  ],
       [ 248.12  ,  242.64  ,  252.04  ,  248.61  ,  262.84  ,  271.93  ,
         277.45  ,  278.92  ,  274.4   ,  266.77  ,  258.69  ,  248.2   ],
       [ 252.3   ,  240.39  ,  248.66  ,  255.41  ,  265.84  ,  274.64  ,
         278.87  ,  278.26  ,  273.36  ,  265.77  ,  261.22  ,  248.8   ],
       [ 239.52  ,  242.16  ,  243.65  ,  256.21  ,  266.    ,  274.64  ,
         279.66  ,  279.17  ,  273.23  ,  264.89  ,  260.47  ,  249.39  ],
       [ 244.95  ,  241.02  ,  244.75  ,  251.51  ,  262.25  ,  272.85  ,
         278.78  ,  277.02  ,  270.84  ,  264.87  ,  259.14  ,  248.89  ],
       [ 244.04  ,  242.96  ,  245.17  ,  255.92  ,  265.94  ,  275.58  ,
         277.5   ,  275.5   ,  272.59  ,  261.36  ,  254.54  ,  245.16  ],
       [ 245.15  ,  241.35  ,  249.29  ,  255.02  ,  264.12  ,  273.83  ,
         277.92  ,  279.6   ,  273.92  ,  266.54  ,  256.88  ,  244.83  ],
       [ 248.89  ,  240.27  ,  245.6   ,  253.41  ,  264.44  ,  272.96  ,
         277.09  ,  274.26  ,  271.2   ,  260.74  ,  248.42  ,  246.11  ],
       [ 244.43  ,  240.49  ,  246.82  ,  255.39  ,  266.63  ,  272.9   ,
         277.25  ,  274.2   ,  272.48  ,  260.43  ,  249.28  ,  246.57  ],
       [ 252.79  ,  242.63  ,  247.48  ,  258.35  ,  266.03  ,  274.71  ,
         277.16  ,  278.55  ,  271.8   ,  265.78  ,  255.21  ,  243.86  ],
       [ 247.12  ,  245.83  ,  249.23  ,  255.7   ,  265.03  ,  274.97  ,
         277.51  ,  278.35  ,  273.96  ,  263.44  ,  255.64  ,  250.09  ],
       [ 245.59  ,  250.75  ,  240.12  ,  253.26  ,  265.89  ,  274.27  ,
         275.25  ,  275.97  ,  271.94  ,  263.74  ,  253.62  ,  243.95  ],
       [ 244.29  ,  249.98  ,  248.36  ,  252.28  ,  267.14  ,  273.45  ,
         276.63  ,  275.15  ,  269.89  ,  262.72  ,  251.06  ,  252.82  ],
       [ 250.32  ,  241.28  ,  243.16  ,  251.37  ,  267.16  ,  274.29  ,
         276.54  ,  275.49  ,  273.54  ,  262.95  ,  254.61  ,  244.49  ],
       [ 252.23  ,  251.42  ,  247.37  ,  254.33  ,  264.13  ,  274.27  ,
         277.91  ,  278.39  ,  271.66  ,  265.05  ,  256.49  ,  250.37  ],
       [ 251.01  ,  246.49  ,  244.48  ,  256.79  ,  268.48  ,  273.48  ,
         276.17  ,  275.96  ,  272.35  ,  262.42  ,  251.49  ,  253.72  ],
       [ 244.77  ,  241.01  ,  242.21  ,  250.23  ,  264.22  ,  273.3   ,
         276.61  ,  275.63  ,  271.4   ,  263.85  ,  254.92  ,  242.63  ],
       [ 244.62  ,  240.12  ,  251.65  ,  256.26  ,  264.17  ,  272.55  ,
         276.38  ,  275.97  ,  272.35  ,  260.26  ,  260.46  ,  247.79  ],
       [ 245.2   ,  243.45  ,  242.63  ,  252.33  ,  264.26  ,  274.66  ,
         276.67  ,  275.17  ,  273.01  ,  264.47  ,  257.25  ,  248.04  ],
       [ 248.1   ,  245.61  ,  249.87  ,  257.78  ,  266.9   ,  274.25  ,
         276.39  ,  275.76  ,  270.87  ,  262.69  ,  254.8   ,  250.74  ],
       [ 249.26  ,  242.05  ,  248.97  ,  252.76  ,  266.36  ,  273.65  ,
         278.7   ,  278.02  ,  272.49  ,  263.92  ,  253.8   ,  247.5   ],
       [ 246.89  ,  244.36  ,  248.17  ,  256.93  ,  267.02  ,  274.51  ,
         275.29  ,  273.09  ,  272.59  ,  265.6   ,  250.46  ,  252.78  ],
       [ 246.45  ,  247.03  ,  246.63  ,  254.01  ,  265.85  ,  273.49  ,
         276.41  ,  275.41  ,  269.49  ,  257.23  ,  254.26  ,  248.5   ],
       [ 242.13  ,  240.63  ,  243.75  ,  252.48  ,  264.86  ,  275.15  ,
         277.7   ,  274.59  ,  272.5   ,  262.79  ,  254.22  ,  248.6   ],
       [ 245.1   ,  243.38  ,  242.77  ,  251.69  ,  265.32  ,  273.57  ,
         278.84  ,  277.91  ,  271.99  ,  265.99  ,  255.61  ,  252.15  ],
       [ 246.13  ,  247.34  ,  243.83  ,  254.08  ,  266.04  ,  273.8   ,
         276.94  ,  276.58  ,  273.79  ,  264.65  ,  256.48  ,  251.51  ],
       [ 247.74  ,  239.06  ,  244.38  ,  252.04  ,  265.73  ,  272.47  ,
         276.98  ,  278.23  ,  273.87  ,  259.68  ,  251.41  ,  241.78  ],
       [ 241.6   ,  246.71  ,  249.93  ,  252.84  ,  266.02  ,  273.88  ,
         276.78  ,  275.41  ,  269.8   ,  259.93  ,  249.18  ,  243.94  ],
       [ 245.59  ,  241.88  ,  246.59  ,  254.65  ,  265.36  ,  274.01  ,
         276.99  ,  277.17  ,  273.7   ,  263.14  ,  257.85  ,  246.84  ],
       [ 252.33  ,  248.12  ,  242.86  ,  252.26  ,  266.26  ,  274.62  ,
         277.51  ,  279.74  ,  274.49  ,  265.64  ,  251.91  ,  249.33  ],
       [ 252.5   ,  247.59  ,  248.82  ,  255.9   ,  265.09  ,  273.42  ,
         277.74  ,  276.34  ,  274.01  ,  260.2   ,  258.06  ,  248.86  ],
       [ 252.85  ,  242.06  ,  246.28  ,  254.76  ,  266.71  ,  273.58  ,
         278.7   ,  279.6801,  274.62  ,  266.07  ,  260.85  ,  246.9   ],
       [ 246.82  ,  250.27  ,  248.81  ,  253.95  ,  265.82  ,  275.47  ,
         276.13  ,  274.99  ,  269.79  ,  264.34  ,  254.28  ,  245.07  ],
       [ 255.14  ,  245.79  ,  248.75  ,  255.22  ,  268.94  ,  275.16  ,
         278.04  ,  275.07  ,  270.55  ,  264.25  ,  254.67  ,  249.39  ],
       [ 247.41  ,  250.2   ,  247.26  ,  253.82  ,  264.28  ,  274.26  ,
         277.22  ,  276.23  ,  271.89  ,  259.64  ,  249.28  ,  248.69  ],
       [ 243.    ,  245.26  ,  245.31  ,  254.97  ,  263.21  ,  273.89  ,
         276.75  ,  275.42  ,  269.6   ,  258.88  ,  255.23  ,  253.84  ],
       [ 245.9   ,  235.7   ,  244.44  ,  247.95  ,  263.35  ,  274.92  ,
         277.67  ,  276.39  ,  273.82  ,  264.31  ,  249.53  ,  246.02  ],
       [ 250.66  ,  244.58  ,  246.8   ,  248.61  ,  266.59  ,  274.56  ,
         277.25  ,  276.27  ,  271.35  ,  260.75  ,  257.02  ,  251.27  ],
       [ 245.47  ,  251.1   ,  244.93  ,  250.6   ,  265.36  ,  274.82  ,
         279.27  ,  277.23  ,  275.31  ,  263.35  ,  254.18  ,  250.93  ],
       [ 248.06  ,  245.35  ,  247.77  ,  250.9   ,  266.2   ,  274.56  ,
         277.7   ,  277.75  ,  271.32  ,  267.57  ,  252.11  ,  249.45  ],
       [ 249.63  ,  246.68  ,  247.49  ,  255.31  ,  265.54  ,  274.02  ,
         277.73  ,  275.84  ,  271.62  ,  258.34  ,  248.56  ,  250.82  ],
       [ 241.62  ,  257.48  ,  247.96  ,  258.22  ,  265.39  ,  275.42  ,
         280.11  ,  280.18  ,  274.69  ,  264.81  ,  248.74  ,  250.75  ],
       [ 243.12  ,  240.    ,  249.57  ,  259.39  ,  269.73  ,  276.15  ,
         279.17  ,  276.97  ,  272.72  ,  264.47  ,  251.15  ,  246.73  ],
       [ 247.28  ,  243.6   ,  245.3   ,  256.1   ,  269.67  ,  276.63  ,
         277.5   ,  276.15  ,  272.6   ,  264.51  ,  250.08  ,  244.12  ],
       [ 244.31  ,  245.26  ,  251.27  ,  258.2   ,  269.23  ,  277.28  ,
         278.09  ,  274.28  ,  268.89  ,  259.18  ,  253.01  ,  249.2   ],
       [ 248.05  ,  248.52  ,  248.25  ,  257.31  ,  266.78  ,  276.48  ,
         281.76  ,  276.78  ,  272.97  ,  268.03  ,  257.33  ,  249.91  ],
       [ 248.14  ,  248.22  ,  244.39  ,  258.36  ,  269.12  ,  276.29  ,
         279.18  ,  276.61  ,  271.37  ,  257.02  ,  249.25  ,  246.96  ],
       [ 247.81  ,  246.38  ,  245.45  ,  258.4   ,  269.41  ,  276.27  ,
         279.36  ,  277.    ,  275.55  ,  262.83  ,  253.32  ,  248.82  ],
       [ 248.38  ,  245.65  ,  251.99  ,  254.02  ,  271.74  ,  277.23  ,
         277.65  ,  273.45  ,  271.1   ,  257.76  ,  256.4   ,  251.3   ],
       [ 244.91  ,  247.23  ,  246.02  ,  256.88  ,  266.29  ,  276.01  ,
         279.16  ,  279.34  ,  275.07  ,  264.47  ,  260.06  ,  247.69  ],
       [ 245.19  ,  245.54  ,  253.27  ,  261.44  ,  268.26  ,  276.95  ,
         281.2   ,  278.93  ,  275.8   ,  267.62  ,  259.2   ,  252.71  ],
       [ 243.85  ,  244.22  ,  245.52  ,  253.71  ,  270.49  ,  276.88  ,
         279.34  ,  278.36  ,  273.99  ,  264.52  ,  254.36  ,  245.63  ],
       [ 246.86  ,  248.67  ,  247.19  ,  254.44  ,  263.16  ,  276.81  ,
         278.43  ,  276.95  ,  272.99  ,  264.51  ,  256.15  ,  251.8   ],
       [ 250.87  ,  252.82  ,  247.19  ,  256.52  ,  262.62  ,  275.2   ,
         277.4   ,  276.    ,  273.79  ,  261.83  ,  255.56  ,  249.26  ],
       [ 247.36  ,  246.03  ,  252.86  ,  255.58  ,  268.74  ,  274.61  ,
         277.73  ,  276.2   ,  275.08  ,  268.29  ,  260.39  ,  254.52  ],
       [ 249.51  ,  248.38  ,  248.76  ,  259.16  ,  267.28  ,  275.32  ,
         278.29  ,  276.2   ,  273.09  ,  268.66  ,  257.18  ,  250.49  ],
       [ 248.81  ,  244.36  ,  246.86  ,  256.96  ,  268.68  ,  277.54  ,
         279.86  ,  279.97  ,  273.06  ,  267.07  ,  257.15  ,  250.49  ],
       [ 250.03  ,  247.77  ,  251.23  ,  256.23  ,  268.59  ,  275.15  ,
         277.72  ,  278.94  ,  274.42  ,  267.14  ,  253.75  ,  253.15  ],
       [ 246.16  ,  250.86  ,  245.87  ,  253.31  ,  268.51  ,  275.84  ,
         277.52  ,  276.19  ,  275.68  ,  269.29  ,  256.62  ,  253.05  ],
       [ 247.26  ,  248.67  ,  245.21  ,  259.71  ,  265.85  ,  275.7401,
         279.99  ,  279.82  ,  275.89  ,  266.91  ,  261.99  ,  253.9   ],
       [ 246.28  ,  246.21  ,  246.88  ,  258.33  ,  267.85  ,  275.64  ,
         277.85  ,  276.39  ,  274.12  ,  265.39  ,  256.85  ,  253.57  ],
       [ 247.09  ,  246.44  ,  246.22  ,  256.26  ,  268.96  ,  275.03  ,
         279.62  ,  277.92  ,  274.49  ,  268.55  ,  254.66  ,  253.4   ],
       [ 245.53  ,  249.51  ,  248.98  ,  258.93  ,  267.35  ,  274.91  ,
         278.52  ,  278.96  ,  275.55  ,  267.43  ,  260.76  ,  248.88  ],
       [ 249.82  ,  250.2   ,  250.58  ,  254.86  ,  267.95  ,  275.15  ,
         278.4   ,  277.98  ,  275.31  ,  267.61  ,  253.98  ,  250.52  ],
       [ 242.19  ,  248.96  ,  243.16  ,  257.6   ,  266.93  ,  275.86  ,
         279.16  ,  279.32  ,  274.23  ,  268.54  ,  256.51  ,  248.66  ],
       [ 250.17  ,  245.43  ,  249.71  ,  254.56  ,  266.73  ,  277.18  ,
         279.58  ,  277.69  ,  272.76  ,  269.18  ,  258.31  ,  252.66  ],
       [ 256.26  ,  249.03  ,  250.96  ,  258.16  ,  267.62  ,  273.48  ,
         276.79  ,  279.19  ,  273.85  ,  265.64  ,  260.18  ,  255.57  ],
       [ 248.85  ,  252.09  ,  248.18  ,  257.83  ,  271.14  ,  276.61  ,
         279.49  ,  276.57  ,  271.73  ,  267.68  ,  259.43  ,  250.4   ],
       [ 256.61  ,  254.16  ,  250.8   ,  261.01  ,  270.44  ,  276.61  ,
         279.02  ,  279.61  ,  273.73  ,  268.21  ,  256.45  ,  251.37  ]])

np.loadtxt() has two arguments: the name of the file we want to read, and the delimiter that separates values on a line. These both need to be character strings (or strings for short), so we put them in quotes.

Finally, note that we haven't stored the Barrow temperature data because we haven't assigned it to a variable. Let's fix that.



In [13]:

    
barrow = np.loadtxt('barrow.temperature.csv', delimiter = ',')

About the Data

The data we're using for this lesson are monthly averages of surface air temperatures from 1948 to 2016 for five different locations. They are derived from the NOAA NCEP CPC Monthly Global Surface Air Temperature Data Set, which has a 0.5 degree spatial resolution.

What is the unit for air temperature used in this dataset? Recall that when we assign a value to a variable, we don't see any output on the screen. To see our Barrow temperature data, we can use the print() function again.



In [14]:

    
print(barrow)









    



[[ 245.66    247.52    245.28    256.32    262.9     272.46    278.06
   275.51    269.34    261.68    251.35    242.52  ]
 [ 248.12    242.64    252.04    248.61    262.84    271.93    277.45
   278.92    274.4     266.77    258.69    248.2   ]
 [ 252.3     240.39    248.66    255.41    265.84    274.64    278.87
   278.26    273.36    265.77    261.22    248.8   ]
 [ 239.52    242.16    243.65    256.21    266.      274.64    279.66
   279.17    273.23    264.89    260.47    249.39  ]
 [ 244.95    241.02    244.75    251.51    262.25    272.85    278.78
   277.02    270.84    264.87    259.14    248.89  ]
 [ 244.04    242.96    245.17    255.92    265.94    275.58    277.5     275.5
   272.59    261.36    254.54    245.16  ]
 [ 245.15    241.35    249.29    255.02    264.12    273.83    277.92
   279.6     273.92    266.54    256.88    244.83  ]
 [ 248.89    240.27    245.6     253.41    264.44    272.96    277.09
   274.26    271.2     260.74    248.42    246.11  ]
 [ 244.43    240.49    246.82    255.39    266.63    272.9     277.25
   274.2     272.48    260.43    249.28    246.57  ]
 [ 252.79    242.63    247.48    258.35    266.03    274.71    277.16
   278.55    271.8     265.78    255.21    243.86  ]
 [ 247.12    245.83    249.23    255.7     265.03    274.97    277.51
   278.35    273.96    263.44    255.64    250.09  ]
 [ 245.59    250.75    240.12    253.26    265.89    274.27    275.25
   275.97    271.94    263.74    253.62    243.95  ]
 [ 244.29    249.98    248.36    252.28    267.14    273.45    276.63
   275.15    269.89    262.72    251.06    252.82  ]
 [ 250.32    241.28    243.16    251.37    267.16    274.29    276.54
   275.49    273.54    262.95    254.61    244.49  ]
 [ 252.23    251.42    247.37    254.33    264.13    274.27    277.91
   278.39    271.66    265.05    256.49    250.37  ]
 [ 251.01    246.49    244.48    256.79    268.48    273.48    276.17
   275.96    272.35    262.42    251.49    253.72  ]
 [ 244.77    241.01    242.21    250.23    264.22    273.3     276.61
   275.63    271.4     263.85    254.92    242.63  ]
 [ 244.62    240.12    251.65    256.26    264.17    272.55    276.38
   275.97    272.35    260.26    260.46    247.79  ]
 [ 245.2     243.45    242.63    252.33    264.26    274.66    276.67
   275.17    273.01    264.47    257.25    248.04  ]
 [ 248.1     245.61    249.87    257.78    266.9     274.25    276.39
   275.76    270.87    262.69    254.8     250.74  ]
 [ 249.26    242.05    248.97    252.76    266.36    273.65    278.7
   278.02    272.49    263.92    253.8     247.5   ]
 [ 246.89    244.36    248.17    256.93    267.02    274.51    275.29
   273.09    272.59    265.6     250.46    252.78  ]
 [ 246.45    247.03    246.63    254.01    265.85    273.49    276.41
   275.41    269.49    257.23    254.26    248.5   ]
 [ 242.13    240.63    243.75    252.48    264.86    275.15    277.7
   274.59    272.5     262.79    254.22    248.6   ]
 [ 245.1     243.38    242.77    251.69    265.32    273.57    278.84
   277.91    271.99    265.99    255.61    252.15  ]
 [ 246.13    247.34    243.83    254.08    266.04    273.8     276.94
   276.58    273.79    264.65    256.48    251.51  ]
 [ 247.74    239.06    244.38    252.04    265.73    272.47    276.98
   278.23    273.87    259.68    251.41    241.78  ]
 [ 241.6     246.71    249.93    252.84    266.02    273.88    276.78
   275.41    269.8     259.93    249.18    243.94  ]
 [ 245.59    241.88    246.59    254.65    265.36    274.01    276.99
   277.17    273.7     263.14    257.85    246.84  ]
 [ 252.33    248.12    242.86    252.26    266.26    274.62    277.51
   279.74    274.49    265.64    251.91    249.33  ]
 [ 252.5     247.59    248.82    255.9     265.09    273.42    277.74
   276.34    274.01    260.2     258.06    248.86  ]
 [ 252.85    242.06    246.28    254.76    266.71    273.58    278.7
   279.6801  274.62    266.07    260.85    246.9   ]
 [ 246.82    250.27    248.81    253.95    265.82    275.47    276.13
   274.99    269.79    264.34    254.28    245.07  ]
 [ 255.14    245.79    248.75    255.22    268.94    275.16    278.04
   275.07    270.55    264.25    254.67    249.39  ]
 [ 247.41    250.2     247.26    253.82    264.28    274.26    277.22
   276.23    271.89    259.64    249.28    248.69  ]
 [ 243.      245.26    245.31    254.97    263.21    273.89    276.75
   275.42    269.6     258.88    255.23    253.84  ]
 [ 245.9     235.7     244.44    247.95    263.35    274.92    277.67
   276.39    273.82    264.31    249.53    246.02  ]
 [ 250.66    244.58    246.8     248.61    266.59    274.56    277.25
   276.27    271.35    260.75    257.02    251.27  ]
 [ 245.47    251.1     244.93    250.6     265.36    274.82    279.27
   277.23    275.31    263.35    254.18    250.93  ]
 [ 248.06    245.35    247.77    250.9     266.2     274.56    277.7
   277.75    271.32    267.57    252.11    249.45  ]
 [ 249.63    246.68    247.49    255.31    265.54    274.02    277.73
   275.84    271.62    258.34    248.56    250.82  ]
 [ 241.62    257.48    247.96    258.22    265.39    275.42    280.11
   280.18    274.69    264.81    248.74    250.75  ]
 [ 243.12    240.      249.57    259.39    269.73    276.15    279.17
   276.97    272.72    264.47    251.15    246.73  ]
 [ 247.28    243.6     245.3     256.1     269.67    276.63    277.5
   276.15    272.6     264.51    250.08    244.12  ]
 [ 244.31    245.26    251.27    258.2     269.23    277.28    278.09
   274.28    268.89    259.18    253.01    249.2   ]
 [ 248.05    248.52    248.25    257.31    266.78    276.48    281.76
   276.78    272.97    268.03    257.33    249.91  ]
 [ 248.14    248.22    244.39    258.36    269.12    276.29    279.18
   276.61    271.37    257.02    249.25    246.96  ]
 [ 247.81    246.38    245.45    258.4     269.41    276.27    279.36    277.
   275.55    262.83    253.32    248.82  ]
 [ 248.38    245.65    251.99    254.02    271.74    277.23    277.65
   273.45    271.1     257.76    256.4     251.3   ]
 [ 244.91    247.23    246.02    256.88    266.29    276.01    279.16
   279.34    275.07    264.47    260.06    247.69  ]
 [ 245.19    245.54    253.27    261.44    268.26    276.95    281.2
   278.93    275.8     267.62    259.2     252.71  ]
 [ 243.85    244.22    245.52    253.71    270.49    276.88    279.34
   278.36    273.99    264.52    254.36    245.63  ]
 [ 246.86    248.67    247.19    254.44    263.16    276.81    278.43
   276.95    272.99    264.51    256.15    251.8   ]
 [ 250.87    252.82    247.19    256.52    262.62    275.2     277.4     276.
   273.79    261.83    255.56    249.26  ]
 [ 247.36    246.03    252.86    255.58    268.74    274.61    277.73
   276.2     275.08    268.29    260.39    254.52  ]
 [ 249.51    248.38    248.76    259.16    267.28    275.32    278.29
   276.2     273.09    268.66    257.18    250.49  ]
 [ 248.81    244.36    246.86    256.96    268.68    277.54    279.86
   279.97    273.06    267.07    257.15    250.49  ]
 [ 250.03    247.77    251.23    256.23    268.59    275.15    277.72
   278.94    274.42    267.14    253.75    253.15  ]
 [ 246.16    250.86    245.87    253.31    268.51    275.84    277.52
   276.19    275.68    269.29    256.62    253.05  ]
 [ 247.26    248.67    245.21    259.71    265.85    275.7401  279.99
   279.82    275.89    266.91    261.99    253.9   ]
 [ 246.28    246.21    246.88    258.33    267.85    275.64    277.85
   276.39    274.12    265.39    256.85    253.57  ]
 [ 247.09    246.44    246.22    256.26    268.96    275.03    279.62
   277.92    274.49    268.55    254.66    253.4   ]
 [ 245.53    249.51    248.98    258.93    267.35    274.91    278.52
   278.96    275.55    267.43    260.76    248.88  ]
 [ 249.82    250.2     250.58    254.86    267.95    275.15    278.4
   277.98    275.31    267.61    253.98    250.52  ]
 [ 242.19    248.96    243.16    257.6     266.93    275.86    279.16
   279.32    274.23    268.54    256.51    248.66  ]
 [ 250.17    245.43    249.71    254.56    266.73    277.18    279.58
   277.69    272.76    269.18    258.31    252.66  ]
 [ 256.26    249.03    250.96    258.16    267.62    273.48    276.79
   279.19    273.85    265.64    260.18    255.57  ]
 [ 248.85    252.09    248.18    257.83    271.14    276.61    279.49
   276.57    271.73    267.68    259.43    250.4   ]
 [ 256.61    254.16    250.8     261.01    270.44    276.61    279.02
   279.61    273.73    268.21    256.45    251.37  ]]

The data are formatted such that:

Each column is the monthly mean, January (1) through December (12)
Each row is a year, starting from January 1948 (1) through December 2016 (69)

More information on the data can be found here.

Arrays and their Attributes

Now that our data are stored in memory, we can start asking substantial questions about it. First, let's ask how Python represents the value stored in the barrow variable.



In [15]:

    
type(barrow)









    Out[15]:





numpy.ndarray

This output indicates that barrow currently refers to an N-dimensional array created by the NumPy library.

A NumPy array contains one or more elements of the same data type. The type() function only tells us that we have a NumPy array. We can find out the type of data contained in the array by asking for the data type of the array.



In [16]:

    
barrow.dtype









    Out[16]:





dtype('float64')

This tells us that the NumPy array's elements are 64-bit floating point, or decimal numbers.

In the last example, we accessed an attribute of the barrow array called dtype. Because dtype is not a function, we don't call it using a pair of parentheses. We'll talk more about this later but, for now, it's sufficient to distinguish between these examples:

np.loadtxt() - A function that takes arguments, which go inside the parentheses
barrow.dtype - An attribute of the barrow array; the dtype of an array doesn't depend on anything, so dtype is not a function and it does not take arguments

How many rows and columns are there in the barrow array?



In [17]:

    
barrow.shape









    Out[17]:





(69, 12)

We see there are 64 rows and 12 columns.

The shape attribute, like the dtype, is a piece of information that was generated and stored when we first created the barrow array. This extra information, shape and dtype, describe barrow in the same way an adjective describes a noun. We use the same dotted notation here as we did with the loadtxt() function because they have the same part-and-whole relationship.

To access the elements of the barrow array, we use square-brackets as follows.



In [18]:

    
barrow[0, 0]









    Out[18]:





245.66

The 0, 0 element is the element in the first row and the first column. Python starts counting from zero, not from one, just like other languages in the C family (including C++, Java, and Perl).

With this bracket notation, remember that rows are counted first, then columns. For instance, this is the value in the first row and second column of the array:



In [19]:

    
barrow[0, 1]









    Out[19]:





247.52000000000001

Challenge: What do each of the following code samples do?

barrow[0]
barrow[0,]
barrow[-1]
barrow[-3:-1]

Slicing NumPy Arrays

We can make a larger selection with slicing. For instance, here is the first year of monthly average temperatures, all 12 of them, for Barrow:



In [20]:

    
barrow[0, 0:12]









    Out[20]:





array([ 245.66,  247.52,  245.28,  256.32,  262.9 ,  272.46,  278.06,
        275.51,  269.34,  261.68,  251.35,  242.52])

The notation 0:12 can be read, "Start at index 0 and go up to, but not including, index 12." The up-to-but-not-including is important; we have 12 values in the array but, since we started counting at zero, there isn't a value at index 12.



In [21]:

    
barrow[0, 12]









    



---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-21-a5a1605091fb> in <module>()
----> 1 barrow[0, 12]

IndexError: index 12 is out of bounds for axis 1 with size 12

Slices don't have to start at zero and they also don't have to include the upper or lower bound, if we want to simply take all the ending or beginning values, respectively.

Here's the last six monthly averages of the first three years, written two different ways:



In [22]:

    
barrow[0:3, 6:12]









    Out[22]:





array([[ 278.06,  275.51,  269.34,  261.68,  251.35,  242.52],
       [ 277.45,  278.92,  274.4 ,  266.77,  258.69,  248.2 ],
       [ 278.87,  278.26,  273.36,  265.77,  261.22,  248.8 ]])



In [23]:

    
barrow[:3, 6:]









    Out[23]:





array([[ 278.06,  275.51,  269.34,  261.68,  251.35,  242.52],
       [ 277.45,  278.92,  274.4 ,  266.77,  258.69,  248.2 ],
       [ 278.87,  278.26,  273.36,  265.77,  261.22,  248.8 ]])

If we don't include a number at all, then the : symbol indicates "take everying."



In [25]:

    
barrow[0, :]









    Out[25]:





array([ 245.66,  247.52,  245.28,  256.32,  262.9 ,  272.46,  278.06,
        275.51,  269.34,  261.68,  251.35,  242.52])

Challenge: What's the mean monthly temperature in August of 2016? Converted to degrees Fahrenheit?

Degrees F can be calculated from degrees K by the formula:

$$ T_F = \left(T_K \times \frac{9}{5}\right) - 459.67 $$

Calculating on NumPy Arrays

Arrays know how to perform common mathematical operations on their values. This allows us to treat them like pure numbers, as we did earlier.

Convert the first year of Barrow air temperatures from degrees Kelvin to degrees Celsius.



In [26]:

    
barrow[0,:] - 273.15









    Out[26]:





array([-27.49, -25.63, -27.87, -16.83, -10.25,  -0.69,   4.91,   2.36,
        -3.81, -11.47, -21.8 , -30.63])

We can also perform calculations that have only matrices as operands.

Calculate the monthly average of the first two years of air temperatures in Barrow. (Consider this the average of the monthly averages.) Then convert to Celsius.



In [27]:

    
two_year_sum = barrow[0,:] + barrow[1,:]
(two_year_sum / 2) - 273.15









    Out[27]:





array([-26.26 , -28.07 , -24.49 , -20.685, -10.28 ,  -0.955,   4.605,
         4.065,  -1.28 ,  -8.925, -18.13 , -27.79 ])

Quite often, we want to do more than add, subtract, multiply, and divide values of data. NumPy knows how to do more complex operations on arrays, including statistical summaries of the data.

What is the overall mean temperature in any month in Barrow between 1948 and 2016 in degrees C?



In [28]:

    
barrow.mean() - 273.15









    Out[28]:





-12.06542246376813

Here, note that the mean() function is an attribute of the barrow array. When the attribute of a Python object is a function, we usually call it a method. Methods are functions that belong to Python objects. Here, the barrow array owns a method called mean(). When we call the mean() method of barrow, the array knows how to calculate its overall mean.

Note, also, that barrow.mean() is an example of a function that doesn't have to take any input. In this example, no input is needed because the overall mean is not dependent on any external information.

How cold was the coldest February in Barrow, by monthly mean temperatures, in degrees C?



In [29]:

    
barrow[:,1].min() - 273.15









    Out[29]:





-37.449999999999989

Challenge: What's the minimum, maximum, and mean monthly temperature for August in Barrow, in degrees C?

Getting Help

How did we find out what methods the array has for us to use?

While in Jupyter Notebook, we can use the handy shortcut:



In [30]:

    
barrow?

In general, Python provides two helper functions, help() and dir().

help(barrow)



In [32]:

    
dir(barrow)









    Out[32]:





['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmatmul__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__xor__',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']

More on Arrays

Before, we saw how the built-in statistical summary methods like mean() could be used to calculate an overall statistical summary for the barrow array or a subset of the array. More often, we want to look at partial statistics, such as the mean temperature in each year or the maximum temperature in each month.

One way to do this is to take what we saw earlier and apply it to each row or column.

What is the mean temperature in 1948? In 1949? And so on...



In [33]:

    
barrow[0,:].mean()









    Out[33]:





259.05000000000001



In [34]:

    
barrow[1,:].mean()









    Out[34]:





260.88416666666666

But this gets tedious very quickly. Instead, we can calculate a statisical summary over a particular axis of the array: along its rows or along its columns.



In [35]:

    
barrow.mean()









    Out[35]:





261.08457753623185



In [36]:

    
barrow.mean(axis = 1)









    Out[36]:





array([ 259.05      ,  260.88416667,  261.96      ,  260.74916667,
        259.73916667,  259.68833333,  260.70416667,  258.61583333,
        258.90583333,  261.19583333,  261.40583333,  259.52916667,
        260.31416667,  259.6       ,  261.96833333,  261.07      ,
        258.39833333,  260.215     ,  259.76166667,  261.14666667,
        260.62333333,  260.64083333,  259.56333333,  259.11666667,
        260.36      ,  260.93083333,  258.61416667,  258.835     ,
        260.31416667,  261.25583333,  261.54416667,  261.921675  ,
        260.47833333,  261.7475    ,  260.015     ,  259.61333333,
        258.33333333,  260.47583333,  261.04583333,  260.72833333,
        260.13166667,  262.11416667,  260.76416667,  260.295     ,
        260.68333333,  262.68083333,  260.40916667,  261.71666667,
        261.38916667,  261.9275    ,  263.8425    ,  260.90583333,
        261.49666667,  261.58833333,  263.11583333,  262.69333333,
        262.5675    ,  262.84333333,  262.40833333,  263.411675  ,
        262.11333333,  262.38666667,  262.9425    ,  262.69666667,
        261.76      ,  262.83      ,  263.89416667,  263.33333333,  264.835     ])

Recall that in our square bracket notation, we number the rows first, then the columns. Also, recall that Python starts counting at zero. Therefore, the "1" axis refers to the columns. To calculate a mean across the column axis, that is, the mean temperature in each year, we use the axis = 1 argument in the function call.

As a quick check, we can confirm that there are 69 values in the output, one for each of the 69 years between 1948 and 2016, inclusive.



In [37]:

    
barrow.mean(axis = 1).shape









    Out[37]:





(69,)

What, then, does the following function call give us?



In [38]:

    
barrow.mean(axis = 0)









    Out[38]:





array([ 247.42144928,  245.85797101,  247.13028986,  255.12594203,
        266.5042029 ,  274.81261014,  277.97144928,  276.98753768,
        272.85855072,  264.11362319,  255.12043478,  249.11086957])

Remembering the difference between axis = 0 and axis = 1 is tricky, even for experienced Python programmers. Here are some helpful, visual reminders:

Basic Data Visualization

Visualization, as line plots, bar charts, and other kinds, can often help us to better interpret data, particularly when we have a large set of numbers. Here, we'll look at some basic data visualization tools available in Python.

While there is no "official" plotting library in Python, the most often used is a library called matplotlib. First, we'll import the pyplot module from matplotlib and use two of its functions to create and display a heatmap of our data.



In [39]:

    
import matplotlib.pyplot as pyplot
%matplotlib inline

We also used a Jupyter Notebook "magic" command to make matplotlib figures display inside our notebook, instead of a separate window. Commands with percent signs in front of them, as in the above example, are particular to Jupyter Notebook and won't work in a basic Python intepreter or command-line.



In [40]:

    
image = pyplot.imshow(barrow, aspect = 1/3)
pyplot.show()

Let's look at the average temperature over the years.



In [41]:

    
avg_temp = barrow.mean(axis = 1)
avg_temp_plot = pyplot.plot(avg_temp)
pyplot.show()

Repeating Tasks with Loops

Let's review what we've just learned.

How to import a library into Python using import library_name;
How to use the NumPy library for working with arrays;
Variable assignment, variable = value;
An array's shape can be retrieved with my_array.shape;
An array's data type can be retrieved with my_array.dtype;
Use array[x, y] to select the element in row X, column Y from an array;
Python starts counting at zero, so array indices start at 0, not 1;
Use a:b to specify a slice that includes indices from A to B;
How to calculate statistical summaries on NumPy arrays;
How to create basic line plots and heat maps using matplotlib;

We saw how to do these things for just one CSV file we loaded into Python. But what if we're interested in more places than just Barrow, Alaska? We could execute the same code for each place we're interested in, each CSV file...

But that's tedious, and tedium is what we're trying to avoid by using computers in the first place!
It's also error-prone; humans are likely to make a mistake as they try to type the same things over and over again.
Finally, it's more difficult and takes more space, more lines of code, to show someone else what we've done or in coming back to it after a long break.

In this next lesson, we'll learn how to instruct Python to do the same thing multiple times without mistake.

Sequences

Before we discuss how Python can execute repetitive tasks, we need to discuss sequences. Sequences are a very powerful and flexible class of data structures in Python.

The simplest example of a sequence is a character string.



In [42]:

    
'Hello, world!'









    Out[42]:





'Hello, world!'

Character Strings

Character strings can be created with either single or double quotes; it doesn't matter which. If you want to include quotes inside a string you have some options...



In [43]:

    
print("My mother's cousin")









    



My mother's cousin



In [44]:

    
print('My mother\'s cousin said, "Behave."')









    



My mother's cousin said, "Behave."

Every character string is a sequence of letters.



In [45]:

    
word = 'bird'

We can access each letter in this sequence of letters using the index notation we saw earlier.



In [46]:

    
print(word[0])
print(word[1])
print(word[2])
print(word[3])









    



b
i
r
d

This is a bad approach, though, for multiple reasons.

This approach doesn't scale; if we want to print the characters in a string that is hundreds of letters long, we have to type quite a lot of indices!
It's also fragile; if we have a longer string, it prints only part of the data and, for a shorter string, it produces an error because we're indexing a part of the sequence that doesn't exist.

Here's a better approach.



In [47]:

    
for letter in word:
    print(letter)









    



b
i
r
d

This is much shorter! And we can quickly see how it scales much better.



In [48]:

    
word = 'cockatoo'
for letter in word:
    print(letter)









    



c
o
c
k
a
t
o
o

Some things to note about this example:

The Python keyword for is what gives this technique its name; it's a for loop.
We can read the first line almost line English: "For (each) letter in (the sequence) word..."
The second line, then, is what we want Python to do "for each" of those things.
The second line is executed once for every element in the word sequence.
(How many times is this?) Four.
In each iteration of the for loop, the variable letter takes on a new value.
(What is the value of letter in the first iteration? The second?)

Most importantly, the second line is indented by four spaces. In Python, code blocks are structured by indentation. When a group of Python lines belong to a for loop, in this example, those lines are indented one level more than the line with the for statement. It doesn't matter whether we use tabs or spaces, or how many we use, as long as we're consistent throughout the Python program.

You'll note that Jupyter Notebook indented the second line automatically for you and that it uses four spaces. Jupyter Notebook follows the Python Enhancement Proposal 8, or PEP-8, which is a style guide for Python. It is therefore strongly recommended that you use four (4) spaces for indentation in your own Python code, wherever you're writing it.

Lists and Tuples

Another type of sequence in Python that is more general and more useful than a character string is the list. Lists are formed by putting comma-separated values inside square brackets.



In [49]:

    
cities = ['Barrow', 'Veracruz', 'Chennai']
for city in cities:
    print('Visit beautiful', city, '!')









    



Visit beautiful Barrow !
Visit beautiful Veracruz !
Visit beautiful Chennai !

Lists can be indexed and sequenced just as we say with character strings (another sequence) and NumPy arrays.



In [50]:

    
cities[0]









    Out[50]:





'Barrow'



In [51]:

    
cities[0:1]









    Out[51]:





['Barrow']



In [52]:

    
cities[0:2]









    Out[52]:





['Barrow', 'Veracruz']



In [53]:

    
cities[-1]









    Out[53]:





'Chennai'

What really distinguishes lists is that they can hold different types of data in a single instance, including other lists!



In [54]:

    
things = [100, 'oranges', [1,2,3]]
for thing in things:
    print(thing)









    



100
oranges
[1, 2, 3]

Just as we saw with NumPy arrays, lists also have a number of useful built-in methods.

How many times is the value "Barrow" found in the cities list? Exactly once.



In [55]:

    
cities.count('Barrow')









    Out[55]:





1

Some of the methods change the array in-place, meaning that they change the underlying list without returning a new list. Since there is no output from these methods, to confirm that something changed with cities, we have to examine its value again.



In [56]:

    
cities.reverse()
cities









    Out[56]:





['Chennai', 'Veracruz', 'Barrow']

Also, try running the above code block multiple times over.

A tuple is a lot like a list in that it can take multiple types of data. However, once it is created, a tuple cannot be changed.



In [57]:

    
cities









    Out[57]:





['Chennai', 'Veracruz', 'Barrow']



In [58]:

    
cities_tuple = tuple(cities)
cities_tuple









    Out[58]:





('Chennai', 'Veracruz', 'Barrow')



In [59]:

    
cities[2] = 'Accra'
cities









    Out[59]:





['Chennai', 'Veracruz', 'Accra']



In [60]:

    
cities_tuple[2] = 'Accra'









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-60-35a016f642d4> in <module>()
----> 1 cities_tuple[2] = 'Accra'

TypeError: 'tuple' object does not support item assignment

And, therefore, tuples also don't have the useful functions that lists have.



In [61]:

    
cities.append('Paris')
cities









    Out[61]:





['Chennai', 'Veracruz', 'Accra', 'Paris']



In [62]:

    
cities_tuple.append('Paris')









    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-e14101c61845> in <module>()
----> 1 cities_tuple.append('Paris')

AttributeError: 'tuple' object has no attribute 'append'

Tuples are advantageous when you have sequence data that you want to make sure doesn't get changed. Partly because they can't be changed, tuples are also slightly faster in computations where the sequence is very large.

Lists and tuples also allow us to quickly exchange values between variables and the elements of a sequence.



In [63]:

    
option1, option2 = ('Chocolate', 'Strawberry')

print('Option 1 is:', option1)
print('Option 2 is:', option2)









    



Option 1 is: Chocolate
Option 2 is: Strawberry

Performing Calculations with Lists

Performing calculations with lists is different from with NumPy arrays.



In [64]:

    
numbers = [1, 1, 2, 3, 5, 8]
numbers * 2









    Out[64]:





[1, 1, 2, 3, 5, 8, 1, 1, 2, 3, 5, 8]



In [65]:

    
numbers + 1









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-65-59cb69fa1d76> in <module>()
----> 1 numbers + 1

TypeError: can only concatenate list (not "int") to list



In [66]:

    
numbers + [13]









    Out[66]:





[1, 1, 2, 3, 5, 8, 13]

So, with lists, the addition operator + means to concatenate while the multiplication operator * means to compound. This is the same for strings:



In [67]:

    
'butter' + 'fly'









    Out[67]:





'butterfly'



In [68]:

    
'Cats! ' * 5









    Out[68]:





'Cats! Cats! Cats! Cats! Cats! '

More Complex Loops

We can execute any arbitrary Python code inside a loop. That includes changing the value of an assigned variable within a loop.



In [69]:

    
length = 0
for letter in 'Bangalore':
    length = length + 1
    
length









    Out[69]:





9

Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well.



In [70]:

    
letter









    Out[70]:





'e'

Also, in the last exercise, we used a loop to count the number of letters in the string 'Bangalore'. Python has a built-in function to count the number of elements in any sequence.



In [71]:

    
len('Bangalore')









    Out[71]:





9



In [72]:

    
cities









    Out[72]:





['Chennai', 'Veracruz', 'Accra', 'Paris']



In [73]:

    
len(cities)









    Out[73]:





4

Challenge: Looping over Sequences

Write a for loop that iterates through the letters of your favorite city, putting each letter inside a list. The result should be a list with an element for each letter.

Hint: You can create an empty list like this:

letters = []

Hint: You can confirm you have the right result by comparing it to:

list("my favorite city")

Challenge: Sequences and Mutability

Which of the sequences we've learned about are immutable (i.e., they can't be changed)?

Strings are (immutable / mutable)?
Lists are (immutable / mutable)?
Tuples are (immutable / mutable)?

And what does this mean for working with each data type?

"birds".upper()

[1, 2, 3].append(4)

(1, 2, 3)

Iterators and Generators

Earlier, we saw that lists in Python have a built-in function to reverse the elements.



In [74]:

    
cities.reverse()
cities









    Out[74]:





['Paris', 'Accra', 'Veracruz', 'Chennai']

There's also a built-in global function, reversed(), that returns a reversed version of a sequence.



In [75]:

    
reversed(cities)









    Out[75]:





<list_reverseiterator at 0x7f1d748496d8>

But, whoa, what is this? Why didn't we get a reversed version of the cities list?



In [76]:

    
print(reversed(cities))









    



<list_reverseiterator object at 0x7f1d74849c18>

What we got from reversed() is called an iterator. A Python iterator is any object that iteratively produces a value as part of a loop.

Remember that everything in Python is an object. Objects that are also iterators have a special built-in behavior where they "give up" a value when they're used inside a for loop, while loop, or similar iterative procedure. Put another way, iterators only produce new values on demand. By returning an iterator instead of a reversed copy of the list, the reversed() function is saving memory and computing time.



In [77]:

    
for each in reversed(cities):
    print(each)









    



Chennai
Veracruz
Accra
Paris

Analyzing Data from Multiple Files

We have almost all of the tools we need to process all of our data files. The last thing we need is a tool for navigating our file system and finding the files that we're interested in.



In [78]:

    
import glob

The glob library contains a function, also called glob, that finds files and directories whose names match a pattern. The pattern is referred to as a "glob regular expression," hence the name of this Python module and its chief function.

glob patterns have two wildcards:

The * character matches zero or more characters;
The ? character matches any one character.

We can use this to get the names of all the CSV files in the current directory:



In [79]:

    
glob.glob('*.csv')









    Out[79]:





['barrow.temperature.csv',
 'reston.temperature.csv',
 'land_o_lakes.temperature.csv',
 'key_west.temperature.csv',
 'wvu.temperature.csv']

As you can see, glob returns a list of the matching files as a list. This means that we can loop over the list of filenames, doing something with each filename in turn.

In our case, we want to generate a set of plots for each file in our temperature dataset.

First Step: Looping over Files

First, let's confirm that we can loop over filenames and load in each file.



In [80]:

    
filenames = glob.glob('*.csv')

for fname in filenames:
    print('Working on', fname, '...')
    data = np.loadtxt(fname, delimiter = ',')









    



Working on barrow.temperature.csv ...
Working on reston.temperature.csv ...
Working on land_o_lakes.temperature.csv ...
Working on key_west.temperature.csv ...
Working on wvu.temperature.csv ...

Second Step: Generating a Plot

Let's use the variables created in the last iteration of the loop to test that we can create the matplotlib figure that we want. Recall that after a for loop has finished, the looping variable retains the value of the last element accessed in the loop.



In [81]:

    
fname









    Out[81]:





'wvu.temperature.csv'



In [82]:

    
fig = pyplot.figure(figsize = (10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('Average (deg K)')
axes1.plot(data.mean(axis = 1))

axes2.set_ylabel('Minimum (deg K)')
axes2.plot(data.min(axis = 1))

axes3.set_ylabel('Maximum (deg K)')
axes3.plot(data.max(axis = 1))

fig.tight_layout()
pyplot.show()









    



---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-82-4f11a3c398f2> in <module>()
     14 
     15 fig.tight_layout()
---> 16 pyplot.show()

/usr/lib/python3/dist-packages/matplotlib/pyplot.py in show(*args, **kw)
    242     """
    243     global _show
--> 244     return _show(*args, **kw)
    245 
    246 

/usr/local/lib/python3.5/dist-packages/ipykernel/pylab/backend_inline.py in show(close, block)
     34     try:
     35         for figure_manager in Gcf.get_all_fig_managers():
---> 36             display(figure_manager.canvas.figure)
     37     finally:
     38         show._to_draw = []

/usr/local/pythonenv/tutorial-env/lib/python3.5/site-packages/IPython/core/display.py in display(*objs, **kwargs)
    162             publish_display_data(data=obj, metadata=metadata)
    163         else:
--> 164             format_dict, md_dict = format(obj, include=include, exclude=exclude)
    165             if not format_dict:
    166                 # nothing to display (e.g. _ipython_display_ took over)

/usr/local/pythonenv/tutorial-env/lib/python3.5/site-packages/IPython/core/formatters.py in format(self, obj, include, exclude)
    143             md = None
    144             try:
--> 145                 data = formatter(obj)
    146             except:
    147                 # FIXME: log the exception

<decorator-gen-9> in __call__(self, obj)

/usr/local/pythonenv/tutorial-env/lib/python3.5/site-packages/IPython/core/formatters.py in catch_format_error(method, self, *args, **kwargs)
    188     """show traceback on failed format call"""
    189     try:
--> 190         r = method(self, *args, **kwargs)
    191     except NotImplementedError:
    192         # don't warn on NotImplementedErrors

/usr/local/pythonenv/tutorial-env/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    305                 pass
    306             else:
--> 307                 return printer(obj)
    308             # Finally look for special method names
    309             method = get_real_method(obj, self.print_method)

/usr/local/pythonenv/tutorial-env/lib/python3.5/site-packages/IPython/core/pylabtools.py in <lambda>(fig)
    238 
    239     if 'png' in formats:
--> 240         png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))
    241     if 'retina' in formats or 'png2x' in formats:
    242         png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs))

/usr/local/pythonenv/tutorial-env/lib/python3.5/site-packages/IPython/core/pylabtools.py in print_figure(fig, fmt, bbox_inches, **kwargs)
    122 
    123     bytes_io = BytesIO()
--> 124     fig.canvas.print_figure(bytes_io, **kw)
    125     data = bytes_io.getvalue()
    126     if fmt == 'svg':

/usr/lib/python3/dist-packages/matplotlib/backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, **kwargs)
   2192                         clip_box = a.get_clip_box()
   2193                         if clip_box is not None:
-> 2194                             bbox = Bbox.intersection(bbox, clip_box)
   2195                         clip_path = a.get_clip_path()
   2196                         if clip_path is not None and bbox is not None:

/usr/lib/python3/dist-packages/matplotlib/transforms.py in intersection(bbox1, bbox2)
    758 
    759         """
--> 760         intersects = not (bbox2.xmin > bbox1.xmax or
    761                           bbox2.xmax < bbox1.xmin or
    762                           bbox2.ymin > bbox1.ymax or

/usr/lib/python3/dist-packages/matplotlib/transforms.py in _get_xmin(self)
    332 
    333     def _get_xmin(self):
--> 334         return min(self.get_points()[:, 0])
    335     xmin = property(_get_xmin, None, None, """
    336         (property) :attr:`xmin` is the left edge of the bounding box.""")

/usr/lib/python3/dist-packages/matplotlib/transforms.py in get_points(self)
   1090     def get_points(self):
   1091         if self._invalid:
-> 1092             points = self._transform.transform(self._bbox.get_points())
   1093             points = np.ma.filled(points, 0.0)
   1094             self._points = points

/usr/lib/python3/dist-packages/matplotlib/transforms.py in transform(self, values)
   1308 
   1309         # Transform the values
-> 1310         res = self.transform_affine(self.transform_non_affine(values))
   1311 
   1312         # Convert the result back to the shape of the input values.

/usr/lib/python3/dist-packages/matplotlib/transforms.py in transform_affine(self, points)
   2345 
   2346     def transform_affine(self, points):
-> 2347         return self.get_affine().transform(points)
   2348     transform_affine.__doc__ = Transform.transform_affine.__doc__
   2349 

/usr/lib/python3/dist-packages/matplotlib/transforms.py in get_affine(self)
   2373         else:
   2374             return Affine2D(np.dot(self._b.get_affine().get_matrix(),
-> 2375                                 self._a.get_affine().get_matrix()))
   2376     get_affine.__doc__ = Transform.get_affine.__doc__
   2377 

/usr/lib/python3/dist-packages/matplotlib/transforms.py in get_affine(self)
   2373         else:
   2374             return Affine2D(np.dot(self._b.get_affine().get_matrix(),
-> 2375                                 self._a.get_affine().get_matrix()))
   2376     get_affine.__doc__ = Transform.get_affine.__doc__
   2377 

/usr/lib/python3/dist-packages/matplotlib/transforms.py in get_matrix(self)
   2551                                    [ 0.0, outh, outb],
   2552                                    [ 0.0,  0.0,  1.0]],
-> 2553                                   np.float_)
   2554             self._inverted = None
   2555             self._invalid = 0

KeyboardInterrupt:

Third Step: Putting It All Together

Now we need to combine the code we've written so far so that we create these images for each file.



In [83]:

    
import glob
import numpy as np
import matplotlib.pyplot as pyplot
%matplotlib inline

# Get a list of the filenames we're interested in
filenames = glob.glob('*.csv')

for fname in filenames:
    print('Working on', fname, '...')
    
    # Load the data
    data = np.loadtxt(fname, delimiter = ',')
    
    # Create a 1 x 3 figure
    fig = pyplot.figure(figsize = (10.0, 3.0))
    axes1 = fig.add_subplot(1, 3, 1)
    axes2 = fig.add_subplot(1, 3, 2)
    axes3 = fig.add_subplot(1, 3, 3)

    axes1.set_ylabel('Average (deg K)')
    axes1.plot(data.mean(axis = 1))

    axes2.set_ylabel('Minimum (deg K)')
    axes2.plot(data.min(axis = 1))

    axes3.set_ylabel('Maximum (deg K)')
    axes3.plot(data.max(axis = 1))

    fig.tight_layout()
    pyplot.show()









    



Working on barrow.temperature.csv ...






    












    



Working on reston.temperature.csv ...






    












    



Working on land_o_lakes.temperature.csv ...






    












    



Working on key_west.temperature.csv ...






    












    



Working on wvu.temperature.csv ...

Note that I've introduced comments into my code using the hash character, #. Anything that comes after this symbol, on the same line, we'll be ignored by the Python interpreter. This is a really useful way of reminding yourself, as well as communicating to others, what certain parts of your code are doing.

Challenge: Integrating over Multiple File Datasets

For each location (each file), plot the difference between that location's mean temperature and the mean across all locations.

Hint: One way to calculate the mean across five (5) files is by adding the 5 arrays together, then dividing by 5. You can add arrays together in a loop like this:

# Start with an array full of zeros that is 69-elements long
running_total = np.zeros((69))

for fname in filenames:
    data = np.loadtxt(fname, delimiter = ',')
    running_total = running_total + data.mean(axis = 1)

Hint: How do you difference two arrays? Remember how the plus, +, and minus, -, operators work on arrays?



In [84]:

    
filenames = glob.glob('*.csv')

running_total = np.zeros((69))

for fname in filenames:
    data = np.loadtxt(fname, delimiter = ',')
    running_total = running_total + data.mean(axis = 1)
    
overall_mean = running_total / len(filenames)

for fname in filenames:
    data = np.loadtxt(fname, delimiter = ',')
    fig = pyplot.figure(figsize = (10.0, 3.0))
    axis1 = fig.add_subplot(1, 1, 1)
    axis1.plot((data.mean(axis = 1) - overall_mean))
    axis1.set_title(fname)
    pyplot.show()

Conditional Evaluation

As humans, we often face choices in our work and daily lives. Given some information about the world, we make a decision to do one thing or another.

It should be obvious that our computer programs need to do this as well. So far, we've written Python code that does the exact same thing with whatever input it receives. This is a great benefit, of course, for our productivity; we can reliably perform analyses the same way on multiple datasets.

Now, we need to learn how Python can detect certain conditions and act accordingly.

Conditional evaluation in Python is performed using what are called conditional statements.



In [85]:

    
a_number = 42

if a_number > 100:
    print('Greater')
    
else:
    print('Not greater')
    
print('Done')









    



Not greater
Done

This code can be represented by the following workflow.

The if statement in Python is a conditional statement; it contains a conditional expression, which evaluates to True or False.



In [86]:

    
a_number > 100









    Out[86]:





False

Thus, when the conditional expression following the if statement evaluates to False, the codes skips to the else statement. Note that it is the indented code blocks below the if and else statements that is conditionally executed.

True and False are special values in Python and the first letter is always capitalized.

Conditional statements don't have to include the else statement. If there is else statement, Python simply does nothing when the test is False.



In [87]:

    
if a_number > 100:
    print(a_number, 'is greater than 100')

Note that there is no output associated with this code.

Challenge: Conditional Expressions

How can you make this code print "Greater" by changing only one line?

a_number = 42

if a_number > 100:
    print('Greater')

else:
    print('Not greater')

print('Done')

There are two (2) one-line changes you could make. Can you find them both?

Conditional Expressions in Python

As we've seen, conditional expressions evaluate to either True or False. These two, special values are called logical or Boolean values. Let's get more familiar with Booleans.



In [88]:

    
True and True









    Out[88]:





True



In [89]:

    
True or False









    Out[89]:





True



In [90]:

    
True or not True









    Out[90]:





True



In [91]:

    
not True









    Out[91]:





False

Here, and and or are two comparison operators. We've seen "greater than" already and there are a couple more.

What do each of the following evaluate to, True or False?

1 < 2
1 <= 1
3 == 3
2 != 3

We can combine multiple comparison operators to create complex tests in our code.



In [92]:

    
if (1 > 0) and (-1 > 0):
    print('Both parts are true')
    
else:
    print('At least one part is false')









    



At least one part is false

What happens if we change the and to an or?



In [93]:

    
if (1 > 0) or (-1 > 0):
    print('At least one part is true')









    



At least one part is true

Finally, we can chain conditional expressions together using elif, which stands for "else if" and follows an initial if statement.



In [94]:

    
number = -3

if number > 0:
    print('Positive')
    
elif number == 0:
    print('Is Zero')
    
else:
    print('Negative')









    



Negative

Whereas each if statement has only one else, it can have as many elif statements as you need.

Checking our Data

How can we put conditional evaluation to work in analyzing our temperature data? Let's say we're interested in temperature anomalies; that is, the year-to-year deviation in temperature from a long-term mean.

Let's also say we want to fit a straight line to the anomalies. We can use another Python library, statsmodels, to fit an ordinary-least squares (OLS) regression to our temperature anomaly.



In [95]:

    
import statsmodels.api as sm

data = np.loadtxt('wvu.temperature.csv', delimiter = ',')

# Subtract the location's long-term mean
y_data = data.mean(axis = 1) - data.mean()

# Create an array of numbers 1948, 1949, ..., 2016
x_data = np.arange(1948, 2017)

# Add a constant (the intercept term)
x_data = sm.add_constant(x_data)

# Fit the temperature anomalies to a simple time series
results = sm.OLS(y_data, x_data).fit()
results.summary()









    Out[95]:





OLS Regression Results

  Dep. Variable:             y           R-squared:             0.082


  Model:                    OLS          Adj. R-squared:        0.069


  Method:              Least Squares     F-statistic:           6.002


  Date:              Sun, 06 May 2018    Prob (F-statistic):   0.0169 


  Time:                  14:56:22        Log-Likelihood:      -67.864


  No. Observations:           69         AIC:                   139.7


  Df Residuals:               67         BIC:                   144.2


  Df Model:                    1                                     


  Covariance Type:       nonrobust                                   




           coef      std err       t       P>|t|  [95.0% Conf. Int.] 


  const    -19.2705      7.867     -2.450   0.017    -34.972    -3.569


  x1         0.0097      0.004      2.450   0.017      0.002     0.018




  Omnibus:         1.480    Durbin-Watson:         1.447


  Prob(Omnibus):   0.477    Jarque-Bera (JB):      1.394


  Skew:            0.228    Prob(JB):              0.498


  Kurtosis:        2.473    Cond. No.           1.97e+05



In [96]:

    
b0, b1 = results.params

# Calculate a line of best fit
fit_line = b0 + (b1 * x_data[:,1])

fig = pyplot.figure(figsize = (10.0, 3.0))
axis1 = fig.add_subplot(1, 1, 1)
axis1.plot(x_data, y_data, 'k')
axis1.plot(x_data, fit_line, 'r')
axis1.set_xlim(1948, 2016)
axis1.set_title(fname)
pyplot.show()

What if we wanted to detect the direction of this fit line automatically? It could be tedious to have a human being tally up how many are trending upwards versus downwards... Let's have the computer do it.

Challenge: Fitting a Line over Multiple File Datasets

Write a for loop, with an if statement inside, that calculates a line of best fit for each dataset's temperature anomalies and prints out a message as to whether that trend line is positive or negative.

Hint: What we want to know about each trend line is whether, for:

results = sm.OLS(y_data, x_data).fit()
b0, b1 = results.params

If b1, the slope of the line, is positive or negative.



In [97]:

    
filenames = glob.glob('*.csv')

# I can do these things outside of the loop
#  because the X data are the same for each dataset
x_data = np.arange(1948, 2017)
x_data = sm.add_constant(x_data) # Add the intercept term)

for fname in filenames:
    data = np.loadtxt(fname, delimiter = ',')
    
    # Subtract the location's long-term mean
    y_data = data.mean(axis = 1) - data.mean()

    # Fit the temperature anomalies to a simple time series
    results = sm.OLS(y_data, x_data).fit()
    
    b0, b1 = results.params

    if b1 > 0:
        print(fname, '-- Long-term trend is positive')
        
    else:
        print(fname, '-- Long-term trend is negative')









    



barrow.temperature.csv -- Long-term trend is positive
reston.temperature.csv -- Long-term trend is positive
land_o_lakes.temperature.csv -- Long-term trend is positive
key_west.temperature.csv -- Long-term trend is positive
wvu.temperature.csv -- Long-term trend is positive

Creating Functions for Reuse

At this point, what have we learned?

What sequences are and how to create them;
A list is created by comma-separated values inside square brackets;
A tuple is created by comma-separated values inside parentheses;
Lists are mutable--their elements can be changed;
Strings and tuples are immutable--the letters or other elements in these sequences cannot be changed;
How to use glob.glob() to create a list of files whose names match a given pattern;
How to use if statements to test a condition;
How to use elif and else statements to test alternative conditions;
Conditional operators including ==, >=, <=, and, and or;
X and Y is only true if both X and Y are true;
X or Y is true if either X, Y, or both are true;
How to implement a test over multiple inputs using an if statement inside a for loop.

At this point, we've done a lot of interesting things with our temperature dataset. But our code is getting kind of long. The interesting parts that we've figured out together could be useful to us in the future or to other people. Is there a way for us to package this code for later re-use?

In Python, we can do just that be creating our own custom functions. We'll start by defining a function that converts temperatures from Kelvin to Celsius.



In [98]:

    
def kelvin_to_celsius(temp_k):
    return temp_k - 273.15

How do we know this function works? We can devise tests where we know what output should correspond with a given input.



In [99]:

    
if kelvin_to_celsius(273.15) == 0:
    print('Passed')
    
if kelvin_to_celsius(373.15) == 100:
    print('Passed')









    



Passed
Passed

Let's break down this function definition.

The def keyword indicates to Python that we are defining a function; the name of the function we want to define comes next.
In parentheses after the function name are any arguments that function takes. Remember that a function doesn't have to take any arguments. In that case, we still need the parentheses, but there is nothing inside them.
Just as with for loops and if statements, after the colon, :, we indent 4 spaces and then write the body of the function. This is what the function actually does when it is called. Inside the body, any arguments provided are available as variables with the name of the argument as we wrote it in the function definition.

What happens if we remove the keyword return from this function? Make this change and call the function again.

Composing Multiple Functions

Now that we've created a function that converts temperatures in degrees Kelvin to degrees Celsius, let's see if we can write a function that converts from degrees Celsius to degrees Fahrenheit.

$$ T_F = \left(T_C \times \frac{9}{5}\right) + 32 $$



In [100]:

    
def celsius_to_fahr(temp_c):
    return (temp_c * (9/5)) + 32



In [101]:

    
if celsius_to_fahr(0) == 32:
    print('Passed')
    
if celsius_to_fahr(-40) == -40:
    print('Passed')









    



Passed
Passed

Now, what if we want to convert temperatures in degrees to Kelvin to degrees Fahrenheit?



In [102]:

    
def kelvin_to_fahr(temp_k):
    temp_c = kelvin_to_celsius(temp_k)
    return celsius_to_fahr(temp_c)



In [103]:

    
kelvin_to_fahr(300)









    Out[103]:





80.33000000000004

Another way to do this is to chain multiple function calls.



In [104]:

    
celsius_to_fahr(kelvin_to_celsius(300))









    Out[104]:





80.33000000000004



In [105]:

    
print(fname)
kelvin_to_fahr(data.mean(axis = 0))









    



wvu.temperature.csv






    Out[105]:





array([ 29.432     ,  31.75321739,  40.46991304,  50.81626087,
        59.77791304,  67.75035565,  71.61695652,  70.44017652,
        63.82113043,  52.56747826,  42.42095652,  33.51643478])

Cleaning Up our Analysis Code

Now that we understand functions in Python we can begin to clean up the codebase we've established so far.

For starters, let's create a function that calculates temperature anomalies.



In [106]:

    
def temperature_anomaly(temp_array):
    return temp_array.mean(axis = 1) - temp_array.mean()

temperature_anomaly(data)









    Out[106]:





array([ 0.08412995,  1.17246329, -0.21003671,  0.26162995,  0.60996329,
        1.05080495,  0.41496329,  0.00829662, -0.21753671,  0.11912995,
       -1.52170338,  0.27996329, -1.06087005, -0.52753671, -0.79087005,
       -1.29337005, -0.08087005, -0.12170338, -0.87253671, -0.56503671,
       -0.77586171, -0.79003671,  0.03412995, -0.05087005, -0.47837005,
        0.55329662, -0.02837005,  0.32996329, -0.99087005, -0.33087005,
       -1.13170338, -0.81503671, -0.74003671, -0.88753671,  0.03746329,
       -0.20670338,  0.10162995,  0.09329662,  0.45246329,  0.50329662,
       -0.27420338, -0.52420338,  1.19412995,  1.19746329, -0.32002838,
        0.11329662,  0.03246329, -0.05253671, -0.55587005, -0.48670338,
        1.43412995,  0.62412995, -0.12420338,  0.47912995,  0.77746329,
       -0.30670338,  0.29829662,  0.55997162,  0.66829662,  0.55079662,
       -0.07003671, -0.24753671, -0.06920338,  0.84579662,  1.44079662,
       -0.06670338, -0.78837005,  0.64329662,  1.40829662])

This works great! Now, suppose I suspect the long-term mean is a little biased; maybe I want a different measure of central tendency to use in calculating anomalies. Say, maybe I want to subtract the median temperature value instead of the mean.

In Python, I can just add another argument to provide this flexibility.



In [107]:

    
def temperature_anomaly(temp_array, long_term_avg):
    return temp_array.mean(axis = 1) - long_term_avg

temperature_anomaly(data, np.median(data))









    Out[107]:





array([ -2.15000000e-01,   8.73333333e-01,  -5.09166667e-01,
        -3.75000000e-02,   3.10833333e-01,   7.51675000e-01,
         1.15833333e-01,  -2.90833333e-01,  -5.16666667e-01,
        -1.80000000e-01,  -1.82083333e+00,  -1.91666667e-02,
        -1.36000000e+00,  -8.26666667e-01,  -1.09000000e+00,
        -1.59250000e+00,  -3.80000000e-01,  -4.20833333e-01,
        -1.17166667e+00,  -8.64166667e-01,  -1.07499167e+00,
        -1.08916667e+00,  -2.65000000e-01,  -3.50000000e-01,
        -7.77500000e-01,   2.54166667e-01,  -3.27500000e-01,
         3.08333333e-02,  -1.29000000e+00,  -6.30000000e-01,
        -1.43083333e+00,  -1.11416667e+00,  -1.03916667e+00,
        -1.18666667e+00,  -2.61666667e-01,  -5.05833333e-01,
        -1.97500000e-01,  -2.05833333e-01,   1.53333333e-01,
         2.04166667e-01,  -5.73333333e-01,  -8.23333333e-01,
         8.95000000e-01,   8.98333333e-01,  -6.19158333e-01,
        -1.85833333e-01,  -2.66666667e-01,  -3.51666667e-01,
        -8.55000000e-01,  -7.85833333e-01,   1.13500000e+00,
         3.25000000e-01,  -4.23333333e-01,   1.80000000e-01,
         4.78333333e-01,  -6.05833333e-01,  -8.33333333e-04,
         2.60841667e-01,   3.69166667e-01,   2.51666667e-01,
        -3.69166667e-01,  -5.46666667e-01,  -3.68333333e-01,
         5.46666667e-01,   1.14166667e+00,  -3.65833333e-01,
        -1.08750000e+00,   3.44166667e-01,   1.10916667e+00])

This function is more flexible, but I also have to type more. Suppose that I want a function that allows me to use the median value but will default to the mean value, because the mean value is what most people who will use this function want.

In Python, functions can take default arguments; these are arguments that could be provided when the function is called, but aren't required.



In [108]:

    
def temperature_anomaly(temp_array, long_term_avg = None):
    if long_term_avg:
        return temp_array.mean(axis = 1) - long_term_avg
    
    else:
        return temp_array.mean(axis = 1) - temp_array.mean()
    
pyplot.plot(temperature_anomaly(data))
pyplot.show()



In [109]:

    
pyplot.plot(temperature_anomaly(data, np.median(data)))
pyplot.show()

None is a special value in Python. It means just that: nothing. It is similar to "null" in other programming languages. Like True and False, the first letter is always capitalized.

Because None has a "False-y" value, we can treat it like the Python False.



In [110]:

    
not None









    Out[110]:





True

However, for various reasons, we usually want to distinguish values that are False-y and values that are None. Let's make two changes to our function:

Explicitly test for None;
Remove the else statement, which is unnecessary;



In [111]:

    
def temperature_anomaly(temp_array, long_term_avg = None):
    if long_term_avg is not None:
        return temp_array.mean(axis = 1) - long_term_avg
    
    return temp_array.mean(axis = 1) - temp_array.mean()

pyplot.plot(temperature_anomaly(data))
pyplot.show()

Positional versus Keyword Arguments

In a Python function, there are two kinds of arguments: positional arguments and keyword arguments.

Positional arguments are simpler and more common. In this example, Python knows that the value 1 we've provided is the first (and only) argument, so it assigns this value to the first argument in the corresponding function definition.



In [112]:

    
data.mean(1)









    Out[112]:





array([ 283.9       ,  284.98833333,  283.60583333,  284.0775    ,
        284.42583333,  284.866675  ,  284.23083333,  283.82416667,
        283.59833333,  283.935     ,  282.29416667,  284.09583333,
        282.755     ,  283.28833333,  283.025     ,  282.5225    ,
        283.735     ,  283.69416667,  282.94333333,  283.25083333,
        283.04000833,  283.02583333,  283.85      ,  283.765     ,
        283.3375    ,  284.36916667,  283.7875    ,  284.14583333,
        282.825     ,  283.485     ,  282.68416667,  283.00083333,
        283.07583333,  282.92833333,  283.85333333,  283.60916667,
        283.9175    ,  283.90916667,  284.26833333,  284.31916667,
        283.54166667,  283.29166667,  285.01      ,  285.01333333,
        283.49584167,  283.92916667,  283.84833333,  283.76333333,
        283.26      ,  283.32916667,  285.25      ,  284.44      ,
        283.69166667,  284.295     ,  284.59333333,  283.50916667,
        284.11416667,  284.37584167,  284.48416667,  284.36666667,
        283.74583333,  283.56833333,  283.74666667,  284.66166667,
        285.25666667,  283.74916667,  283.0275    ,  284.45916667,
        285.22416667])



In [113]:

    
data.mean?

We can call this same function more explicitly by providing the axis, 1, as a keyword argument.



In [114]:

    
data.mean(axis = 1)









    Out[114]:





array([ 283.9       ,  284.98833333,  283.60583333,  284.0775    ,
        284.42583333,  284.866675  ,  284.23083333,  283.82416667,
        283.59833333,  283.935     ,  282.29416667,  284.09583333,
        282.755     ,  283.28833333,  283.025     ,  282.5225    ,
        283.735     ,  283.69416667,  282.94333333,  283.25083333,
        283.04000833,  283.02583333,  283.85      ,  283.765     ,
        283.3375    ,  284.36916667,  283.7875    ,  284.14583333,
        282.825     ,  283.485     ,  282.68416667,  283.00083333,
        283.07583333,  282.92833333,  283.85333333,  283.60916667,
        283.9175    ,  283.90916667,  284.26833333,  284.31916667,
        283.54166667,  283.29166667,  285.01      ,  285.01333333,
        283.49584167,  283.92916667,  283.84833333,  283.76333333,
        283.26      ,  283.32916667,  285.25      ,  284.44      ,
        283.69166667,  284.295     ,  284.59333333,  283.50916667,
        284.11416667,  284.37584167,  284.48416667,  284.36666667,
        283.74583333,  283.56833333,  283.74666667,  284.66166667,
        285.25666667,  283.74916667,  283.0275    ,  284.45916667,
        285.22416667])

Keyword arguments are provided as key-value pairs.

Positional arguments have to be provided in the right order, otherwise Python doesn't know which value goes with which argument. When there's only one argument, it doesn't matter, but with multiple arguments, they have to come in order.



In [115]:

    
np.zeros?



In [116]:

    
np.zeros(10, np.int)









    Out[116]:





array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])



In [117]:

    
np.zeros(np.int, 10)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-117-e48d568115ac> in <module>()
----> 1 np.zeros(np.int, 10)

TypeError: expected sequence object with len >= 0 or a single integer

Keyword arguments, on the other hand, can be provided in any order.



In [118]:

    
np.zeros(dtype = np.int, shape = 10)









    Out[118]:





array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Documenting Functions

As we saw earlier, comments help clarify to others and our future selves what our intentions are and how our code works. This kind of documentation is critical for creating software that other people want to use and that we will be able to make sense of later.

Functions, in particular, should be documented well using both in-line comments and also docstrings.



In [119]:

    
def temperature_anomaly(temp_array, long_term_avg = None):
    '''
    Calculates the inter-annual temperature anomalies. Subtracts the
    long-term mean by default but another long-term average can
    be provided as the `long_term_avg` argument.
    '''
    if long_term_avg is not None:
        return temp_array.mean(axis = 1) - long_term_avg
    
    # If no long-term average is provided, use the overall mean
    return temp_array.mean(axis = 1) - temp_array.mean()

Challenge: Functions

Create one (or both, for an extra challenge) of the following functions...

A function called fences that takes an input character string and surrounds it on both sides with another string, e.g., "pasture" becomes "|pasture|" or "@pasture@" if either "|" or "@" are provided.
A function called rescale that takes an array and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0.

Hint: Strings can be concatenated with the plus operator.

'cat' + 's'

Hint: If $x_0$ and $x_1$ are the lowest and highest values in an array, respectively, then the replacement value for any element $x$, scaled to between 0.0 and 1.0, should be:

$$ \frac{x - x_0}{x_1 - x_0} $$

Dep. Variable:	y	R-squared:	0.082
Model:	OLS	Adj. R-squared:	0.069
Method:	Least Squares	F-statistic:	6.002
Date:	Sun, 06 May 2018	Prob (F-statistic):	0.0169
Time:	14:56:22	Log-Likelihood:	-67.864
No. Observations:	69	AIC:	139.7
Df Residuals:	67	BIC:	144.2
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
const	-19.2705	7.867	-2.450	0.017	-34.972 -3.569
x1	0.0097	0.004	2.450	0.017	0.002 0.018

Omnibus:	1.480	Durbin-Watson:	1.447
Prob(Omnibus):	0.477	Jarque-Bera (JB):	1.394
Skew:	0.228	Prob(JB):	0.498
Kurtosis:	2.473	Cond. No.	1.97e+05