Python tutorial/refresher (ICON2017)

This Python tutorial was created for students of the "Neuroimaging: Pattern Analysis" course at the University of Amsterdam. These students already had some experience with Python and could optionally go through this notebook to refresh their Python knowledge/skills before the start of the course.

Participants of the "MVPA in Python" workshop of ICON2017 can go through this notebook before the actual workshop to gain/refresh their Python skills. (Some knowledge of Python is recommended, though not strictly necessary, for this workshop!)

Estimated time to complete: ~2 hours (~5 hours if you do all the ToDos)
Credits: This tutorial is partly based on work by Tomas Knapen and Daan van Es (VU University).
License: this material is MIT licensed, so you can reuse it if you want!

What this tutorial is about

This tutorial will recap some of the most important aspects of Python (section 2: General Python). Then, 'numpy' (Python's scientific computing package) is discussed and explained and followed by a section on 'Matplotlib' (Python's plotting package). Finally, this tutorial ends with a section on how to work with Nifti-files - the file format used by most (MRI) neuroimaging software packages.

Importantly, while scrolling through this tutorial, you'll encounter some "ToDos", short (non-graded) assignments for you to practice with the material. Please do them as they will substantially improve your understanding! Also, you will see "ToThinks", short questions that you can use to test your understanding of the previous material.

Contents:

This tutorial will treat the following concepts in order:

  1. Jupyter notebooks
  2. General Python
  3. Numpy
  4. Matplotlib / plotting
  5. Interacting with Nifti-files

1. Jupyter Notebooks

Congratulations, you've managed to fire up your first Jupyter notebook!

But wait, what exactly is an Jupyter notebook?

Basically, using Jupyter notebooks is like using the web-browser as a kind of editor from which you can run python code, similar to the MATLAB interactive editor or RStudio. The cool thing about these notebooks is that they allow you to mix code "cells" (see below) and text "cells" (such as this one). The (printed) output from code blocks are displayed right below the code blocks themselves.

Jupyter notebooks have two modes: edit mode and command mode.

  • Command mode is indicated by a grey cell border with a blue left margin (as is the case now!): When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently (some shortcuts in command mode will be discussed later!). Enter command mode by pressing Esc or using the mouse to click outside a cell’s editor area;

  • Edit mode is indicated by a green cell border and a prompt showing in the editor area: When a cell is in edit mode, you can type into the cell, like a normal text editor. Enter edit mode by pressing Enter or using the mouse to double-click on a cell’s editor area.

When you're reading and scrolling through the tutorials, you'll be in the command mode mostly. But once you have to program (or write) stuff yourself, you have to switch to edit mode. But we'll get to that. First, we'll explain something about the two types of cells: code cells and text cells.

1.1. Code cells

Code cells are the place to write your Python code, similar to MATLAB 'code sections' (which are usually deliniated by %%). Importantly, unlike the interactive editors in RStudio and MATLAB, a code cell in Jupyter notebooks can only be run all at once. This means you cannot run it line-by-line, but you have to run the entire cell!

Let's look at an example (click the code cell below and press the "play" button in the notebooks toolbar or press ctr+Enter).


In [ ]:
print("I'm printing Python code")
print(3 + 3)

As you can see above, you can only run the entire cell (i.e. both print statements). Sometimes, of course, you'd like to organise code across multiple cells. To do this, you can simply add new blocks (cells) by selecting "Insert --> Insert Cell Below" on the toolbar (or use the shortcut ctr+B when you're in command mode; "B" refers to "below"). This will insert a new code cell below the cell you have currently highlighted (the currently highlighted cell has a blue box around it).

Try inserting a cell below and write some code (e.g. "print 10*10").

Another cool feature of Jupyter notebooks is that you can display figures in the same notebook! You simply define some plots in a code cell and it'll output the plot below it.

Check it out by executing (click the "play" button or ctr+Enter) the next cell.


In [ ]:
# We'll get to what the code means later in the tutorial
import matplotlib.pyplot as plt # The plotting package 'Matplotlib' is discussed in section 3!

# This command makes sure that the figure is plotted in the notebook instead of in a separate window!
%matplotlib inline 
plt.plot(range(10))
plt.show()

1.2. Text ('markdown') cells

Next to code cells, jupyter notebooks allow you to write text in so-called "markdown cells" (the cell this text is written in is, obviously, also a markdown cell). Markdown cells accept plain text and can be formatted by special markdown-syntax. A couple of examples:

# One hash creates a large heading
## Two hashes creates a slightly smaller heading (this goes up to four hashes)

Bold text can be created by enclosing text in **double asterisks** and italicized text can be created by enclosing text in *single asterisks*. You can even include URLs and insert images from the web; check this link for a cheatsheet with markdown syntax and options! All the special markdown-syntax and options will be converted to readable text after running the text cell (again, by pressing the "play" icon in the toolbar or by ctr+Enter).

To insert a text (markdown) cell, insert a new cell ("Insert --> Insert Cell Below" or ctr+B). Then, while highlighting the new cell, press "Cell --> Cell Type --> Markdown" on the toolbar on top of the notebook (or, while in command mode, press ctr+m; "m" refers to "markdown"). You should see the prompt, the In [ ] thingie, disappear. Voilà, now it's a text cell!

Try it below!

**ToDo**: Insert a new markdown cell and try to write the following:

"**OMG**, this is the most awesome Python tutorial *ever*."

1.3. Getting help

Throughout this course, you'll encounter situations in which you have to use functions that you'd like to get some more information on, e.g. which inputs it expects. To get help on any python function you'd like to use, you can simply write the function-name appended with a "?" and run the cell. This will open a window below with the "docstring" (explanation of the function). Take for example the built-in function len(). To get some more information, simply type len? in a code cell and run.

Create a code cell below, type len? and check the output!

If this method (i.e. appending with "?") does not give you enough information, try to google the name of the function together with 'python' and, if you know from which package the function comes, the name of that package. For instance, for len() you could google: 'python len()', or later when you'll use the numpy package, and you'd like to know how the numpy.arange function works you could google: 'python numpy arange'.

REMEMBER: Google is your friend! Please try to google things first, before you ask your classmates and definitely before you ask the tutors!

The online python community is huge, so someone definitely already wondered about something you might be struggling with.

1.4. Saving your work & closing down the notebook

You're running and actively editing the notebook in a browser, but remember that it's still just a file located on your account on the server. Therefore, for your work to persist, you need to save the notebook once in a while (and definitely before closing the notebook). To save, simply press the floppy-disk image in the top-left (or do CTR+s). If you want to close your notebook, simply close the browser.

However, when you close the browser, the notebook is still "running" in the background in your terminal where you started it!

To stop the notebook from running, please go to the terminal where you started it, and either close the terminal or press ctr + c in the terminal. Also, if you want to open another jupyter notebook (for example the assignment-notebook), simply open a new terminal and start a notebook using the jupyter notebook [ipynb-file] command; this will open a new tab in your browser with the new notebook!

Alright. Now that you've had your first contact with Jupyter notebooks - let's start the Python tutorial.

NOTE: if you've done Codeacademy and feel comfortable with Python's syntax, you quickly skip through the next section (until the section on Numpy) or skip it entirely. Also, some parts will overlap quite substantially with the Codeacademy material; this serves as some extra practice material, but can be skipped if comfortable with the discussed concepts.

2. Python recap

This section will provide a recap of most of the stuff you've learned in the Codecademy tutorial. Here, we will cover:

  • Python data types (integers, floats, lists, dictionaries)
  • Python functions
  • Python classes
  • Conditionals (if-then-else)
  • Loops (for-loop, list-comprehensions)
  • Imports

2.1. Structure of Python

Python is a multipurpose programming language, meaning it can be used for almost anything. While R is mostly used for statistics, and php is used for web programming only, Python is a general language, specified by the packages you add on to it (using import statements). So, "pure" Python provides some basic functionality, but Python's versatility comes from specialized packages for almost any purpose.

For example:

  • the scipy package provides functionality for scientific computing (e.g. statistics, signal processing);
  • the numpy package provides data structures and functionality for (very fast) numeric computing (e.g. multidimensional numeric array computations, some linear algebra);
  • the matplotlib package provides plotting functions;
  • and various specialied neuroimaging packages provide functionality to work and analyze (f)MRI (e.g. nibabel and nipype) and MEG/EEG (e.g. MNE).

Basically, there are packages for everything you can think of (also: creating websites, game programming, etc.)! In this course, we will mostly use basic Python in combination with the scientific computing packages ('numpy', 'scipy') and specialized neuroimaging packages ('nibabel', 'nipype').

Import statements

As explained above, Python ships with some default functionality. This means that it's already available upon starting a notebook (or any other Python environment) and doesn't need to be imported. An example is the function len().


In [ ]:
my_list = [1, 2, 3]
print(len(my_list))

However, non-built-in packages - such as numpy - need to be explicitly imported to access their functionality. After importing, their functions are accessible as: {package}.{function}.

For example:


In [ ]:
import numpy

# Now you can access the numpy function `add()` as numpy.add()
print(numpy.add(5, 3))

However, writing numpy in front of every function you access from it becomes annoying very quickly. Therefore, we usually abbreviate the package name by two or three characters, which can be achieved through:

import {package} as {abbreviation}.

For example, people usually abbreviate the numpy import as:


In [ ]:
import numpy as np

# Now you can access numpy functions such as 'add()' as:
print(np.add(5, 3))

Throughout the tutorials, you'll see different packages (e.g. nibabel and scipy) being imported using abbreviations.

Also, you don't need to import an entire package, but you can also import a specific function or class. This is done as follows:

from {package} import {function1}, {function2}, {etc}

An example:


In [ ]:
from numpy import add, subtract

# Now I can simply call add() and subtract()
print(add(5, 3))

Note that some packages have a hierarchical structure with subpackages (also called modules). For example, scipy has a subpackage ndimage (with functions for n-dimensional arrays). To import only this subpackage, do the following:


In [ ]:
from scipy import ndimage
# Now you can call functions from the ndimage subpackage,
# e.g. gaussian_filter

print(ndimage.gaussian_filter([10, 5, 4], 2))

Note that you can mix and match all of these operations to customize the import to your own liking (see cell below for such a fancy import). In this course, we'll usually just import entire packages (e.g. import numpy as np) or specific functions/subpackages (e.g. from scipy import stats).


In [ ]:
# a fancy import
from scipy.stats import binom_test as omg_binomial_testing_so_cool

print(omg_binomial_testing_so_cool(0.5, 10))
**ToDo**: Insert a code cell below and import the function "randn" rom the numpy subpackage "random" and rename it "random_normal_generator".

Python versions

As a side note: there are currently two different supported versions of Python, 2.7 and 3.6. Somewhat confusingly, Python 3.0 introduced many backwards-incompatible changes to the language, so code written for 2.7 may not work under 3.x and vice versa. For this class all code will use Python 3.6 (but 90% of this notebook should be compatible with Python 2.7., I think ...)

2.2 Basic data types

"Pure" (i.e. built-in) Python has mostly the same data types as you might know from MATLAB or R, such as numbers (integers/floats), strings, and lists (cells in MATLAB; lists in R). Also, Python has to data types that might be unknown to MATLAB/R users, such as "dictionaries" and "tuples", which are explained later.

Numbers

Numbers are represented either as integers ("whole" numbers) or floats (numbers with decimals, basically).


In [ ]:
x = 3
print(x, type(x)) # use type(variable) to find out of what data-type something is!

y = 3.15
print(y, type(y))

Let's try to change x as defined above with some basic arithmetic operations:


In [ ]:
print(x + 1)   # Addition;
print(x - 1)   # Subtraction;
print(x / 2)   # Division;
print(x * 2)   # Multiplication;
print(x ** 2)  # Exponentiation;

The above commands add something to x, but do not change x itself. To permanently change x, you can use the syntax below.


In [ ]:
x = 3
x += 1 # This is the same as: x = x + 1
print(x)  
x *= 2 # This is the same as: x = x * 2
print(x)
**ToDo**: in the cell below, make a new variable, `y`, which should contain x minus 5 and subsequently raised to the 4th power.

In [ ]:
x = 5
y = # for you to fill in!

Booleans

Python implements all of the usual operators for comparisons. Similar to what you might know from other languages, '==' tests equivalence, '!=' for not equivalent, and '<' and '>' for larger/smaller than.


In [ ]:
a = 3
b = 5
is_a_equal_to_b = a == b

print(is_a_equal_to_b)
print(type(is_a_equal_to_b))

However, for Boolean logic, python doesn't use operators (such as && for "and" and | for "or") but uses special (regular English) words:


In [ ]:
bool_1 = (3 > 5) # False, because 3 is not greater than 5
bool_2 = (5 == 5) # True, because, well, 5 is 5

print(bool_1 and bool_2)  # Logical AND, both have to be True
print(bool_1 or bool_2)   # Logical OR, either one of them has to be True
print(not bool_1)         # Logical NOT, the inverse of bool_1
print(bool_1 != bool_2)   # Logical XOR, yields True when bool_1 and bool_2 are not equal
**ToDo**: Try to change the expression of bool_1 and bool_2 above e.g. try:

`bool_1 = (1 < 2)` and see what happens to the boolean logic below - does it make sense?

In [ ]:
# Do your ToDo here:

Strings

Strings in Python are largely the same as in other languages.


In [ ]:
h = 'hello'   # String literals can use single quotes
w = "world"   # or double quotes; it does not matter.

print(h)
print(len(h))  # see how many characters in this string

A very nice feature of Python strings is that they are easy to concatenate: just use '+'!


In [ ]:
hw = h + ' ' + w  # String concatenation
print(hw)

You can also create and combine strings with what is called 'string formatting'. This is accomplished by inserting a placeholder in a string, that you can fill with variables. An example is given below:


In [ ]:
# Here, we have a string with a placeholder '%s' (the 's' refers to 'string' placeholder)
my_string = 'My favorite programming language is: %s'
print('Before formatting:')
print(my_string)

# Now, to 'fill' the placeholder, do the following:
my_fav_language = 'Python'
my_string = 'My favorite programming language is: %s' % my_fav_language

print('After formatting')
print(my_string)

You can also use specific placeholders for different data types:


In [ ]:
week_no = 1 # integer
string1 = 'This is week %i of pattern analysis' % week_no # the %i expects an integer!
print(string1)

project_score = 99.50 # float
string2 = 'I will get a %f on my final project' % project_score
print(string2)

# You can also combine different types in a string:
string3 = 'In week %i of pattern analysis, %s will get a %f for my lab-assignment' % (week_no, "I", 95.00)
print(string3)
**ToDo**: Using the variables defined below, print the string:

"Pattern analysis will be my favorite course 4ever"

In [ ]:
to_print = "%s will be my favorite course %iever" % # add something here!
print(to_print)

Lists

A list is the Python equivalent of an array, but can be resized and can contain elements of different types. It is similar to a list in R and a cell in MATLAB. Note that indices in python start with 0! This means that the 3rd element of the list below is accessed through [2]:


In [ ]:
xs = [3, 1, 2]   # create a list
print(xs)
print(xs[2])
print(xs[-1])     # Negative indices count from the end of the list; prints "2"

# Also, it can contain different types, including lists themselves!
xy = [1, 2, [1, 2, 3], "my_string"]
print(xy)

# And it's easy to change list entries
xy[0] = 'whatever'

# and nested lists
xy[2][0] = 'another string'
print(xy)

In addition to accessing list elements one at a time, Python provides concise syntax to access specific parts of a list (sublists); this is known as slicing:


In [ ]:
nums = np.arange(5)     # arange is a numpy function that creates a list of integers
print(nums)         # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])    # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])     # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])     # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])      # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print(nums[:-1])    # Slice indices can be negative; prints ["0, 1, 2, 3]"
print(nums[::2])    # return values in steps of 2
print(nums[1::2])   # returns values in steps of 2, but starting from the second element
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print(nums)         # Prints "[0, 1, 8, 9, 4]"

Importantly, slicing in Python is "end exclusive", which means that the last index in your slice is not returned. Thus ...

nums = np.arange(10) nums[0:5]

... returns 0 up till and including 4 (not 5!).

Check it out below:


In [ ]:
nums = np.arange(10)
print(nums[0:5])
**ToDo**: From the list below, extract the numbers 2, 3, 4, 5, and 6 using a slice!

In [ ]:
my_list = range(20)
print # your answer here
**ToDo**: from the list below (number from 0 to 19), extract the values 5, 7, 9 using a slice.

In [ ]:
my_list = np.arange(20)
print # your answer goes here
**ToDo**: And a slightly more difficult one: extract the sublist 19, 16, 13 below:

In [ ]:
my_list = np.arange(20)
print # your answer goes here

Dictionaries

Dictionaries might be new for those who are used to MATLAB or R. Basically, a dictionary is an unordered list in which list entries have a name (or more generally, a "key"). To get a value from a dictionary, you have to use the "key" as index instead of using an integer:


In [ ]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary by indexing with the key!

It is easy to add entries to a dictionary:


In [ ]:
d['fish'] = 'wet'    # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"

Like a list, an entry in a dictionary can be of any data type:


In [ ]:
d['rabbit'] = ['omg', 'so', 'cute']
print(d['rabbit'])

If you try to 'index' a dictionary with a key that doesn't exist, it raises a "KeyError":


In [ ]:
print(d['monkey'])
**ToDo**: Add a new key to the dictionary `d` named "mouse" and with the value 5.

In [ ]:
# Do the ToDo here
**ToDo (more difficult)**: Values of dictionaries can be any type of object, even dictionaries themselves! So, add a new key to the dictionary `d` named `"another_dict"` with the value of *another* dictionary with the keys `"a"` and `"b"` and the corresponding values `1` and `2`. Then, print out (in a single statement) the value `2` from the `"another_dict"` key!

In [ ]:
# Do the ToDo here

tuples

Tuples are very much like lists, but the main difference is that they are immutable. In other words, after creating them, they cannot be modified:


In [ ]:
# A list can be modified ...
my_list = [1, 2, 3]
my_list[0] = 0
print(my_list)

In [ ]:
# ... but a tuple cannot.
my_tuple = (1, 2, 3)
print(my_tuple[0]) # you can print parts of tuple ...
my_tuple[0] = 0   # but you cannot modify it!

You probably won't use tuples a lot, but you might come across them when a function returns more than one output:


In [ ]:
def my_epic_function(integer):
    
    return integer, integer * 2

outputs = my_epic_function(10)
print(outputs)
print(type(outputs))

# also, you can unpack tuples (and also lists) as follows:
output1, output2 = outputs
print(output2)

2.3 Functions and methods

You probably know how to define a Python function from what you've learned at Codecademy. Similar to R (but unlike MATLAB), you have to explicitly state what you want to return from the function by the "return" statement. Importantly, without this return statement, your Python environment cannot access the variables in the function.

Below, an example without a return statement is outlined (run the cell to define the function):


In [ ]:
def add_2_to_number(number):
    new_number = number + 2

Now, if we call the function "add_2_to_number", Python actually won't "find" this variable and it will raise an error, because we haven't returned it (try it out below by running the cell):


In [ ]:
add_2_to_number(5)
print(new_number)

(In fact, like most languages, Python does not have access to variables within the function scope. Functions, on the other hand, do have access to variables outside the function itself. But that's slightly more complicated stuff, that we won't get into during this course!)

Let's redefine our function and include a return statement (note: we'll overwrite the old function). Run the cell below:


In [ ]:
def add_2_to_number(number):
    new_number = number + 2
    return new_number

# note that we could also have done: return number + 2 and omit the new_number line

In [ ]:
# Let's try it out!
output_from_function = add_2_to_number(2)
print("Output from the add_2_to_number function is: %i" % output_from_function)

Note that you can name the output of the function whatever you like! In this case, as you can see above (in which we used output_from_function), this doesn't have to be called new_number. You can name it whatever (e.g. omg_awesome_output_lolz) - Python doesn't care.

**ToDo**: in the code cell below, write a function, called `extract_last_element`, that takes one input-argument - a list - and returns the last element of the list. Then, call the function in the cell below that (starting with # test your function!)

In [ ]:
# implement your function here!

In [ ]:
# test your function!
my_list = [0, 1, 2, 3, 4, 5]
print(extract_last_element(my_list))

Methods

However, in Python, functions are not the only things that allow you to 'do' things with data. In others' code, you'll often see function-like expressions written with periods, like this: some_variable.function(). These .function() parts are called 'methods', which are like functions 'inherent' to an object. In other words, it is a function that is applied to the object it belongs to.

Different type of objects in Python, such as stings and lists, have their own set of methods. For example, the function you defined above ('extract_last_element()') also exists as a method each list has, called 'pop()'! (This is a builtin, standard, method that each list in Python has.) See for yourself in the block below.


In [ ]:
my_list = [0, 5, 10, 15] 
print(my_list.pop())

# You can also just do the following (i.e. no need to define a variable first!):
print([0, 5, 10, 15].pop())

# which is the same as:
print(extract_last_element(my_list))

Not only lists, but also other data-types (such as strings, dictionaries, and, as we'll see later, numpy arrays) have their own methods. How methods work exactly is not really important, but it is necessary to know what it does, as you'll see them a lot throughout this course.

Just a couple of (often-used) examples:


In [ ]:
# For lists
x = [0, 10, 15]
x.append(20) # Add a new element to the end of the list using the append() method!
print(x)

In [ ]:
my_dict = {'a': 0, 'b': 1, 'c': 2}
print(my_dict.values())
print(my_dict.keys())

Default arguments in functions/methods

Importantly, and unlike most (scientific) programming languages, Python supports the use of 'default' arguments in functions. Basically, if you don't specify an optional argument, it uses the default:


In [ ]:
def exponentiate_number(number, power=2):
    return number ** power

print(exponentiate_number(2)) # now it uses the default!
print(exponentiate_number(2, 10)) # now it "overwrites" the default and uses power=10
print(exponentiate_number(number=2, power=10)) # also note that you can 'name' arguments

2.4 Loops

Loops in Python (for- and while-loops) are largely similar to MATLAB and R loops, except for their syntax:


In [ ]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)

Basically, each data type that is a type of "iterable" (something that you can iterate over) can be used in loops, including lists, dictionaries, and tuples.


In [ ]:
# An example of looping over a list
my_list = [1, 2, 3]
for x in my_list:
    print(x)

MATLAB users might be used to looping over indices instead of the actual list entries, like the following:

for i=1:100
    disp(some_list(i));
end

In Python, however, you loop (by default) over the contents of a list:

for entry in some_list:
    print(entry)

If you want to access for the entry AND the index, you can use the built-in enumerate function:


In [ ]:
my_list = ['a', 'b', 'c']
for index, entry in enumerate(my_list):
    
    print('Loop iteration number (index) = %i, entry=%s' % (index, entry))

# Don't forget that Python indexing starts at zero!

In [ ]:
# Looping over a tuple (exactly the same as looping over a list)
my_tuple = (1, 2, 3)
for x in my_tuple:
    print(x)

In [ ]:
# Iterating over a dictionary can be done in a couple of ways!
my_dict = {'a': 1, 'b': 2, 'c': 3}

# Looping over the keys ONLY
for key in my_dict:
    print(key)

In [ ]:
# Looping over both the keys and the entries
for key, entry in my_dict.items():
    print(key, entry)

Advanced loops: list comprehensions

Sometimes, writing (and reading!) for-loops can be confusing and lead to "ugly" code. Wouldn't it be nice to represent (small) for-loops on a single line? Python has a way to do this: using what is called list comprehensions. It does exactly the same thing as a for-loop: it takes a list, iterates over its entries (and does something with each entry), and (optionally) returns a (modified) list.

Let's look at an arbitrary example of a for-loop over a list:


In [ ]:
nums = [0, 1, 2, 3, 4]

# Also, check out the way 'enumerate' is used here!
for index, x in enumerate(nums):
    nums[index] = x ** 2

print(nums)

You can make this code simpler using a list comprehension:


In [ ]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums] # importantly, a list comprehension always returns a (modified) list!
print(squares)

Also, list comprehensions may contain if-statements!


In [ ]:
string_nums = ['one', 'two', 'three']
starts_with_t = ['yes' if s[0] == 't' else 'no' for s in string_nums]
print(starts_with_t)

List comprehensions are somewhat of a more advanced Python concept, so if you don't feel comfortable using them (correctly) in your future assignments, use regular for-loops by all means! In the upcoming tutorials, though, we'll definitely use them, so make sure you understand what they do!

2. Numpy

This section is about Numpy, Python's core library for scientific computing (together with Scipy). While the syntax of basic Python (as we've gone over in the previous section) is important to do neuroimaging analysis in Python, knowing numpy is essential. The most important feature of Numpy is it's core data structure: the numpy ndarray (which stands for n-dimensional array, referring to the fact that the array may be of any dimension: 1D, 2D, 3D, 180D ... nD).

Python lists vs. numpy arrays

Basically, numpy arrays are a lot like Python lists. The major difference, however, is that numpy arrays may contain only a single data-type, while Python lists may contain different data-types within the same list.


In [ ]:
# this list contains mixed data-types: an integer, a float, a string, a list
python_list = [1, 2.5, "whatever", [3, 4, 5]] 

for entry in python_list:
    
    print(entry)
    print('this is a: %s' % type(entry), '\n') # this last '\n' prints an empty line 
                                               # (so that the printed statements are easier to read)

Numpy thus only allows entries of the same data-type. This difference between Python lists and numpy arrays is basically the same as R lists (allow multiple data-types) versus R matrices/arrays (only allow one data type), and is also the same as MATLAB cells (allow multiple data-types) versus MATLAB matrices (only allow one data type).

In fact, if you try to make a numpy array with different data-types, numpy will force the entries into the same data-type (in a smart way), as is shown in the example below:


In [ ]:
import numpy as np # this is how numpy is often imported

# Importantly, you often specify your arrays as Python lists first, and then convert them to numpy
to_convert_to_numpy = [1, 2, 3.5]               # specify python list ...
numpy_array = np.array(to_convert_to_numpy)     # ... and convert ('cast') it to numpy

for entry in numpy_array:
    
    print(entry)
    print('this is a: %s \n' % type(entry))

As you can see, Numpy converted our original list (to_convert_to_numpy), which contained both integers and floats, to an array with only floats! You might think that such a data structure that only allows one single data type is not ideal. However, the very fact that it only contains a single data-type makes operations on numpy arrays extremely fast. For example, loops over numpy arrays are often way faster than loops over python lists. This is because, internally, Python has to check the data-type of each loop entry before doing something with that entry. Because numpy arrays one allow a single data-type, it only has to check for the entries' data type once. If you imagine looping over an array or list of length 100,000, you probably understand that the numpy loop is way faster.

Let's check out the speed difference between Python list operations and numpy array operations:


In [ ]:
# timeit is a cool 'feature' that you can use in Notebooks (no need to understand how it works)
# it basically performs a computation that you specify a couple of times and prints how long it took on average
%timeit a = [x * 2 for x in range(0, 100000)] # multiplies each entry in a list of 0 - 100,000 by two

And now let's do the same with numpy:


In [ ]:
%timeit b = np.arange(0, 100000) * 2 # np.arange creates a np.array in the same way 'range' creates a Python list

more than 10 times as fast! This really matters when you start doing more complex operations, on, let's say, multidimensional brain images!

Numpy arrays: creation

As shown ealier, numpy arrays can be created by defining a Python list and converting it to a numpy array explicitly. Importantly, a simple Python list will be converted to a 1D numpy array, but a nested Python list will be converted to a 2D (or even higher-dimensional array), as is shown here:


In [ ]:
my_list = [1, 2, 3]
my_array = np.array(my_list)
print(my_array, '\n')

my_nested_list = [[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]]

my_2D_array = np.array(my_nested_list)
print(my_2D_array)

As you can see, creating numpy arrays from nested lists becomes cumbersome if you want to create (large) arrays with more than 2 dimensions. There are, fortunately, a lot of other ways to create ('initialize') large, high-dimensional numpy arrays. One often-used method is to create an array with zeros using the numpy function np.zeros:


In [ ]:
my_desired_dimensions = (2, 5) # suppose I want to create a matrix with zeros of size 2 by 5
my_array = np.zeros(my_desired_dimensions)

print(my_array)

Using arrays with zeros is often used in what is called 'pre-allocation', in which you create an 'empty' array with only zeros and, for example, 'fill' that array in a loop, as is done below:


In [ ]:
my_desired_dimensions = (5, 5)
my_array = np.zeros(my_desired_dimensions)

print('Original zeros-array')
print(my_array)

# make sure you understand what we're looping over - what is 'range(my_desired_dimensions[0])'?
for i in range(my_desired_dimensions[0]):
    
    for ii in range(my_desired_dimensions[1]):
        
        my_array[i, ii] = (i + 1) * (ii + 1)

print('\nFilled array')
print(my_array)

Important: realize that loops (not shown above), if-statements and other boolean logic is the same for numpy and python!

Also, you can create numpy arrays using other functions:


In [ ]:
ones = np.ones((5, 10)) # create an array with ones
print(ones, '\n')

rndom = np.random.random((5, 10)) # Create an array filled with random values
print(rndom)
**ToDo**: Create a numpy array with zeros only of the dimensions: (5, 2, 4, 9)

In [ ]:
# Create the numpy array!

Numpy: indexing

Indexing (extracting a single value of an array) and slicing (extracting multiple values - a subset - from an array) of numpy arrays is largely the same as other scientific computing languages such as R and MATLAB. Let's check out a 1D example:


In [ ]:
my_array = np.arange(10, 21)
print('Full array:')
print(my_array, '\n') # make sure you understand why this array is from 10 - 20 (check the np.arange function using google!)

print('Several indices of my_array:')
print(my_array[0])
print(my_array[[0, 1]])
print(my_array[[0, 1, 9]])
print(my_array[5:8]) # from index 5 until (but NOT including) index 8
print(my_array[:3]) # slice from index 0 until 3 (NOT including 3)

Setting values in numpy arrays works the same way as lists:


In [ ]:
my_array[0] = 100000
print(my_array, '\n')

my_array[[3, -1]] = -100
print(my_array, '\n')

my_array[:-3] = 0 # can you figure out what this slice (:-3) does?
print(my_array)

Often, instead of working on and indexing 1D array, we'll work with multi-dimensional (>1D) arrays. Indexing multi-dimensional arrays is, again, quite similar to other scientific computing languages:


In [ ]:
my_array = np.zeros((3, 3)) # 3 by 3 array with zeros
my_array[2, 2] = 1
print(my_array, '\n')

my_array[:, 0] = 100 # the ':' specifies ALL entries, so setting the entire first dimension to 100 
print(my_array)
**ToThink**: Make sure you understand why, in the first indexing operation (my_array[2, 2] = 1), the element in the *third* column and row is changed (and not the second).

In addition to setting specific slices to specific values, you can also extract sub-arrays using slicing/indexing:


In [ ]:
my_array = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])
print(my_array, '\n')

first_row = my_array[0, :]
print('First row')
print(first_row, '\n')

first_col = my_array[:, 0]
print('First column')
print(first_col)

Perhaps one of the most frequently used indexing in scientific computing, and especially neuroimaging, is boolean indexing (also called 'masking'). In this type of indexing, you index an array with a boolean array (i.e. array with True and False values) of the same shape. Let's look at an example:


In [ ]:
my_array = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

print(my_array, '\n')

bool_idx = my_array > 5
print('Boolean index corresponding to values in my_array larger than 5')
print(bool_idx, '\n')

print('Indexing my_array with bool_idx:')
print(my_array[bool_idx])
**ToDo**: Use a boolean index to extract all negative values from the matrix (my_matrix) below:

In [ ]:
my_matrix = np.array([[0, 1, -1, -2],
                      [2, -5, 1, 4],
                      [10, -2, -4, 20]])

# Make a new boolean index below ...

# And use it to index my_matrix:

Numpy: data-types

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy guesses the datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:


In [ ]:
x = np.array([1, 2])  # Let numpy choose the datatype (here: int)
y = np.array([1.0, 2.0])  # Let numpy choose the datatype (here: float)
z = np.array([1, 2], dtype=np.float64)  # Force a particular datatype (input: int, but converted to 64bit float)

print(type(x[0]))
print(type(y[0]))
print(type(z[0]))

You can read all about numpy datatypes in the documentation.

Numpy: methods vs. functions

In the previous section (Basic Python), you've learned that, in addition to functions, 'methods' exist that are like functions of an object. In other words, methods are functions that are applied to the object itself. You've seen examples of list methods, e.g. my_list.append(1), and string methods, e.g. my_string.count('b'). Like lists and strings, numpy arrays have a lot of convenient methods that you can call. Again, this is just like a function, but then applied to itself. Often, numpy provides both a function and method for simple operations. Let's look at an example:


In [ ]:
my_array = np.arange(10)
print(my_array, '\n')

mean_array = np.mean(my_array)
print('The mean of the array is: %f' % mean_array, '\n')

mean_array2 = my_array.mean() 
print('The mean of the array (computed by its corresponding method) is: %f' % mean_array2, '\n')

print('Is the numpy function the same as the corresponding method? Answer: %s' % str(mean_array == mean_array2))

If there is both a function and a method for the operation you want to apply to the array, it really doesn't matter what you choose! Let's look at some more (often used) methods of numpy ndarrays:


In [ ]:
my_array = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

print(my_array.std(), '\n') # standard deviation, same as np.std(array)
print(my_array.T, '\n') # tranpose an array, same as np.transpose(array)
print(my_array.diagonal(), '\n') # get the diagonal, same as np.diag(array)
print(my_array.min(), '\n') # same as np.min(array)
print(my_array.max(), '\n') # same as np.max(array)
print(my_array.sum(), '\n') # same as np.sum(array)

Importantly, a method may or may not take arguments (input). If no arguments are given, it just looks like "object.method()", i.e. two enclosing brackets with nothing in between. However, a method may take one or more arguments (like the my_list.append(1) method)! This argument may be named or unnamed - doesn't matter. An example:


In [ ]:
my_array2 = np.random.random((3, 3))
print('Original array:')
print(my_array2, '\n')

print('Use the round() method with the argument 3:')
print(my_array2.round(3), '\n')

print('Use the round() method with the named argument 5:')
print(my_array2.round(decimals=5), '\n')

Some methods that you'll see a lot in the upcoming tutorials. In addition to the methods listed above, you'll probably see the following methods a lot in the rest of this course (make sure you understand them!):


In [ ]:
my_array = np.arange(10)
print(my_array.reshape((5, 2))) # reshape to desired shape

In [ ]:
temporary = my_array.reshape((5, 2))
print(temporary.ravel()) # unroll multi-dimensional array to single 1D array

In [ ]:
array1 = np.arange(10)
array2 = np.arange(10).T # this is the transposed version of array1

# .dot() does matrix multiplication (dot product: https://en.wikipedia.org/wiki/Dot_product)
# This linear algebra operation is used very often in neuroimaging research (which depends heavily on the General Linear Model!)
dot_product = array1.dot(array2)
print(dot_product)
**ToDo**: From the variable below (test_array), extract the diagonal, and sum them together. Print the output.

In [ ]:
test_array = np.arange(9).reshape((3, 3))
# Extract the diagonal and sum them together:

Numpy: methods vs. attributes?

Alright, by now, if you see a variable followed by a word ending with enclosed brackets, e.g. my_array.mean(), you'll know that it's a method! But sometimes you might see something similar, but without the brackets, such as my_array.size. This .size is called an attribute of the variable my_array. Like a method, it's an integral part of an object (such as a numpy ndarray). The attribute may be of any data-type, like a string, integer, tuple, an array itself. Let's look at an example:


In [ ]:
my_array = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

print(my_array, '\n')
print('The size (number of element) in the array is:')
print(my_array.size, '\n')
print('The .size attribute is of data-type: %s' % type(my_array.size))

Alright, so by now you might be wondering what the difference between a method and an attribute is. Superficially, you can recognize a method by the form object.method() (note the brackets!), like my_array.round(); an attribute is virtually the same but without brackets, in the form of object.attribute, like my_array.size.

Conceptually, you may think of methods as things that do something with the array, while attributes say something about the array.

For example, my_array.size does nothing with the array - it only says something about the array (it gives information about its size), while my_array.mean() really does something (i.e. calculates the mean of the array).

Again, you might not use attributes a lot during this course, but you'll definitely see them around in the code of the tutorials. Below, some of the common ndarray attributes are listed:


In [ ]:
my_array = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

print('Size (number of elements) of array:')
print(my_array.size, '\n') # returns an integer

print('Shape of array:')
print(my_array.shape, '\n') # this is a tuple!

print('Number of dimensions:')
print(my_array.ndim) # this is an integer

Numpy: array math

Now you know all the numpy basics necessary to do neuroimaging analysis! As you'll see in the last section (Working with nifti-images), we'll work with 3D (structural MRI images) or 4D (functional MRI images) numpy arrays a lot. Given that you know how the basics about numpy in general and numpy ndarrays in particular, we can utilize some of numpy's best features: (very fast) array math.

Basic mathematical functions operate elementwise on arrays, which means that the operation (e.g. addition) is applied onto each element in the array.


In [ ]:
x = np.zeros(10)
print(x, '\n')
x += 1 # remember: this the same as x = x + 1
print(x)

Often, there exist function-equivalents of the mathematical operators. For example, x + y is the same as np.add(x, y). However, it is recommended to use the operators wherever possible to improve readability of your code. See below for an example:


In [ ]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum: element 1 from x is added to element 1 from y, element 2 from x is added to element 2 from y, etc.
print(x + y, '\n')
print(np.add(x, y))

In [ ]:
# Elementwise difference; both produce the array
print(x - y, '\n')
print(np.subtract(x, y))

In [ ]:
# Elementwise product; both produce the array
print(x * y, '\n')
print(np.multiply(x, y))

In [ ]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y, '\n')
print(np.divide(x, y))

In [ ]:
# Elementwise square root; produces the array
print(np.sqrt(x))
**ToDo**: Do an elementwise product between the two variables defined below (matrix_A and matrix_B) and subsequently add 5 to each element.

In [ ]:
# Implement the ToDo!
matrix_A = np.arange(10).reshape((5, 2))
matrix_B = np.arange(10, 20).reshape((5, 2))

Note that unlike MATLAB, * is elementwise multiplication, not matrix multiplication. We instead use the dot function (or method) to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices.


In [ ]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

Probably the most used functions in numpy are the sum() and mean() fuctions (or methods!). A nice feature is that they can operate on the entire array (this is the default) or they can be applied per dimension. In numpy, dimensions are referred to as axes. Applying functions along axes is very common in scientific computing! An example:


In [ ]:
x = np.array([[1, 2],[3, 4], [5, 6]])

print('Original array:')
print(x, '\n')

print('Sum over ALL elements of x:')
print(np.sum(x), '\n')

print('Sum over each column of x:')
print(np.sum(x, axis=0), '\n')

print('Sum over each row of x:')
print(x.sum(axis=1)) # this is the method form! Is exactly the same as np.sum(x, axis=1)
**ToDo**: Calculate the mean of each column of the matrix y below and print the output:

In [ ]:
y = np.arange(20).reshape((5, 4))

# Calculate the mean of each column

Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:


In [ ]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print('x array is of shape: %r' % list(x.shape))
print(x, '\n')

v = np.array([1, 0, 1])
print('v vector is of shape: %r' % list(v.shape))
print(v, '\n')

y = np.zeros(x.shape)   # Create an empty (zeros) matrix with the same shape as x
print('Shape of (pre-allocated) y-matrix: %r' % list(y.shape))

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(x.shape[0]): # see how the shape attributes comes in handy in creating loops?
    y[i, :] = x[i, :] + v

print('The result of adding v to each row of x, as stored in y:')
print(y)

This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, like this [[1 0 1], [1 0 1], [1 0 1], [1 0 1]], and subsequently elementwise addition of x + vv:


In [ ]:
vv = np.tile(v, (4, 1)) # i.e. expand vector 'v' 4 times along the row dimension (similar to MATLAB's repmat function)
y = x + vv  # Add x and vv elementwise
print(y)

Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:


In [ ]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

The line y = x + v works even though x has shape (4, 3) and v has shape (3,) due to broadcasting; this line works as if v actually had shape (4, 3), where each row was a copy of v, and the sum was performed elementwise.

This broadcasting function is really useful, as it prevents us from writing unnessary and by definition slower explicit for-loops. Additionally, it's way easier to read and write than explicit for-loops (which need pre-allocation). Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the documentation.

Here are some applications of broadcasting using different functions:


In [ ]:
x = np.array([[1, 2],[3, 4], [5, 6]], dtype=np.float)
x_sum = x.sum(axis=0)

print(x / x_sum)
**ToDo**: Calculate the mean of each column in the variable 'my_array' below. Subsequenly, subtract these column-means from each row of the matrix.

In [ ]:
my_array = np.arange(20).reshape((5, 4))
print(my_array)

# Calculate the mean of each column using the axis=0 argument! and subtract it from the array itself.

Now, you know the most important numpy concepts and functionality that is necessary to do neuroimaging analysis. Surely, there is a lot more to the numpy package that what we've covered here! But for now, let's continue with plotting using Matplotlib!

3. Matplotlib

Matplotlib is Python's main plotting library. In this section give a brief introduction to the matplotlib.pyplot module, which provides a plotting system similar to that of MATLAB.


In [ ]:
import matplotlib.pyplot as plt # this is how matplotlib.pyplot is usually imported
import numpy as np

By running this special iPython command, we will be displaying plots inline (instead of in a new window; this is only relevant in Jupyter notebooks!):


In [ ]:
%matplotlib inline

Plotting

The most important function in matplotlib is plot, which allows you to plot 1 dimensional data. Here is a simple example:


In [ ]:
# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)

# Plot the points using matplotlib
plt.plot(x, y)
plt.show()

With just a little bit of extra work we can easily plot multiple lines at once, and add a title, legend, and axis labels:


In [ ]:
y_cos = np.cos(x)
y_sin = np.sin(x)

# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])

Subplots

You can plot different things in the same figure using the subplot function. Here is an example:


In [ ]:
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Set up a subplot grid that has 2 'rows' and 1 'column',
# and set the first such subplot as active.
plt.subplot(2, 1, 1)

# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')

# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')

# Show the figure.
plt.tight_layout() # this prevents plots from overlapping
plt.show()

And we can add text to plots as such:


In [ ]:
x = np.arange(-1, 1, 0.01) # array from -1 to 1 in steps of 0.01

# Set up a subplot grid that has 2 'rows' and 1 'column',
# and set the first such subplot as active.
plt.subplot(2, 1, 1)
plt.plot(x, x**2)
plt.title('x squared')
plt.text(-0.5, 0.4, r"$y=x^2$", fontsize=20, color="blue")

# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, x ** 5)
plt.title('x cubed')
plt.text(-0.5, 0.4, r"$y=x^3$", fontsize=20, color="green")

plt.tight_layout() # tight_layout automatically adjusts subplot params so that the subplot(s) fits in to the figure area
plt.show()

You can read much more about the subplot function in the documentation.

Histograms

Now let's generate some random data using the np.random functionality, and get some histograms of their distributions


In [ ]:
rand_uniform = np.random.uniform(size = (1000)) # uniform distribution
rand_normal = np.random.normal(size = (1000))   # normal distribution
rand_exponential = np.random.exponential(size = (1000)) # exponential distribution

In [ ]:
# Set up a subplot grid that has rows=2 and columns=3,
# and set the first such subplot as active.
plt.subplot(1, 3, 1)

# Make the first plot
plt.hist(rand_uniform, color='g')
plt.title('uniform')

# Set the second subplot as active, and make the second plot.
plt.subplot(1, 3, 2)
plt.hist(rand_normal, color='b')

plt.title('normal')

# Set the second subplot as active, and make the second plot.
plt.subplot(1, 3, 3)
plt.hist(rand_exponential, color='r')
plt.title('exponential')

plt.tight_layout()
plt.show()

Images

So far, we've been plotting lines. Matplotlib also has easy functionality for plotting two-dimensional data-arrays, such as pictures.


In [ ]:
# the scipy module includes some pictures, so we can use those:
from scipy import misc
bw_photo = misc.face(gray=True)

# now we can use the imshow command to plot 2d data:
plt.imshow(bw_photo,cmap='gray')
plt.show()

In order to try to understand what a bw image really is, let's just print the array that we loaded in


In [ ]:
print("Shape of photo object: %r" % list(bw_photo.shape), '\n')
print(bw_photo)

You can see that it's actually just a bunch of numbers (one number per pixel), and when we plot colorbar with the original picture, we see that high numbers convert to bright shades, and low numbers convert to dark shades.


In [ ]:
plt.imshow(bw_photo,cmap='gray')
plt.colorbar()
plt.show()

Using the knowledge that black and white images are nothing more than just 2d-arrays of values, we can also generate our own images:


In [ ]:
rand_img = np.random.random((200,200)) # Create an array filled with random values
print(rand_img, '\n')
plt.imshow(rand_img, cmap='gray')
plt.colorbar()
plt.show()

Now that you know some basics of data types, arrays and plotting, it's time to get to work with some fMRI data!

3. Neuroimaging analysis with Python

Although (f)MRI data can come in many formats, by far the most used format is nifti (nifti has been developed by the neuroimaging informatics technology initiative). Throughout these tutorials you'll work with these nifti-files (also called nifti images), which you can recognize by their extension of .nii or its compressed version .nii.gz. This file-format is also used by some of the main neuroimaging software packages such as AFNI, nipype, and FSL. (You'll use FSL later in this course!).

However, we'd like to inspect and analyze nifti images in Python! Nibabel is a Python package that allows us to read and load nifti images as numpy arrays in a straightforward manner. Let's look at an example:


In [ ]:
# If you haven't done so already, load some other packages we need
import numpy as np
import matplotlib.pyplot as plt
import nibabel as nib # common way of importing nibabel

In [ ]:
data_path = 'func_data.nii.gz'
data = nib.load(data_path)

Right now, we haven't loaded the actual data in memory yet, but only the meta-data (information about the data, such as its shape, dimensions, and some technical MRI acquisition parameters). So, what we've loaded now (in the variable data) is not a numpy array (yet), but a different object: a Nifti-object.

Like a numpy array, this object has some attributes and methods that we can use. Let's check out some of this meta-data that is already available through its methods/attributes:


In [ ]:
print('Shape of the data:')
print(data.shape) # this is one of the attributes of a Nifti-object in Python, just like a numpy array shape attribute!

From the shape of the data, we can already infer some things about the file we're working with. Clearly, it has 4 dimensions, which means that it is a functional file: 3 spatial dimensions, and 1 time dimension! Another attribute of this Nifti-object that you might check is the header attribute. Perhaps confusingly, the header attribute is another object itself, also with its own methods and attributes. Let's check out some of them:


In [ ]:
header = data.header
print('Get the measurements of our dimensions:')
print(header.get_zooms())

Above, the four numbers represent the measurements of the data's four dimensions (3 space dims, 1 time dim). But what the hell do these numbers represent? Millimeters, liters, km/hr?

Actually, the units of these dimensions can be accessed through the header.get_xyzt_units() (x, y, and z = spatial dims, t = time dims):


In [ ]:
print('Units of our measurements:')
print(header.get_xyzt_units())
# so space is in mm (voxels are 3 x 3 x 3.3 mm in size!) and time is in seconds (time resolution = 2 seconds)

So now we know that our data reflects activity from 3 x 3 x 3.3 mm voxels across 200 timepoints sampled at a time resolution of 2 seconds. Now, let's dig a little deeper into the data! But remember, we still have to load the actual data in memory. We can do that by the get_data() method. This method actually returns a numpy 4D-array we can work with!


In [ ]:
func_data = data.get_data()
print(type(func_data))

Let's look at some of the data! Because we have 80 x 80 x 37 x 100 = 23680000 values in total, let's not look at all data at once. We'll select a axial slice somewhere in the middle of the brain of the first recorded volume (i.e. t = 1).


In [ ]:
axial_slice_no = 18
slice_1 = func_data[:, :, axial_slice_no, 0] # the 0 refers to the first timepoint (t = 1)
print(slice_1, '\n')
print('Shape of slice_1: %r' % list(slice_1.shape))

Hmm, seems like nothing is going on! This is because Python doesn't print out the entire array, because that would clutter the notebook. Here, it is only plotting the first couple of rows and the last couple of rows from slice_1. The fact that we're only seeing zeros is because not all 23680000 values contain brain data! If a data point has value 0, it is very likely that this data point falls outside the recording of the brain.

To visualize the entire slice, let's plot is using imshow:


In [ ]:
plt.imshow(slice_1, cmap='gray')
plt.axis('off')
plt.show()

As you can see, only a part of the image is brain data, which is centered in the middle of the image. All locations in the image are represented as zeros in the slice-data. Knowing this, let's select (slice) only a small part of some actual brain data, as visualized below:


In [ ]:
import matplotlib.patches as patches

print('Selected region indicated by red square:')
rect = patches.Rectangle((35, 35), 5, 5, linewidth=2, edgecolor='r', facecolor='none')
plt.imshow(slice_1, cmap='gray')
ax = plt.gca()
ax.add_patch(rect)
plt.axis('off')
plt.show()

In [ ]:
# The data corresponding to the red square above
print(slice_1[35:41, 35:41])

What does this tell you? Well, fMRI data is simply a lot of numbers, with (relatively) higher numbers indicating (relatively) higher activity in functional MRI images.

So far, we have only looked at some (static) representation of voxel activity of some voxels, ignoring the time-dimension of our data (we only looked at the first slice, t = 1). We can also look at the activity of a single voxel across time, as visualized below:


In [ ]:
volumes_to_plot = 20
plt.figure(figsize=(30, 30))

for t in range(volumes_to_plot):
    
    plt.subplot(5, 5, (t+1))
    plt.imshow(func_data[:, :, axial_slice_no, t], cmap='gray')
    rect = patches.Rectangle((35, 35), 1, 1, linewidth=1, edgecolor='r', facecolor='none')
    ax = plt.gca()
    ax.add_patch(rect)
    plt.axis('off')
    plt.title('t = %i' % (t + 1), fontsize=20)
plt.show()

It is really hard to spot whether the image of the slice is actually different between different time points! To the naked eye, it just seems like the same picture. This is because activity fluctations in fMRI are very small - most of the time just 1-3% compared to baseline (average) activity. That's why it's hard to spot activity differences by looking at the pictures alone.

The activity fluctuations become clearer when you plot the activity of voxels across time in a linegraph (which allows you to see relative activity fluctuations better). So, let's plot the activity over time (t = 1:200) for the voxel within the red square in the figure above:


In [ ]:
plt.figure(figsize=(15, 6))
plt.plot(func_data[35, 35, axial_slice_no, :])
plt.title('Activity over time for voxel (35, 35, %i)' % axial_slice_no, fontsize=20)
plt.xlabel('Time (t)', fontsize=18)
plt.ylabel('Activity (arbitrary units)', fontsize=18)
plt.show()
**ToDo**: Now, we can see how voxel activity changes over time (for one voxel at a time). Using the plotting code in the code block above and your knowledge of array indexing to plot the timecourses for some other voxels.

In [ ]:
# mess around with plotting some voxels in this code block!

Do you get flatlined timecourses for some voxels? That means you've plotted a voxel that lies outside the brain. Do you get index errors? Look for the shape attribute of func_data to see how far you can go in the x, y and z direction.

4. Plotting colorful brain images!

Alright, now that we've taken a look at a (4D) functional MRI file, let's take a look at a "statistics" file (also called 'map'). A statistics file is (usually) the result of a (traditional) fMRI analysis, which reflects how 'active' voxels were during a given task for a specific condition.

Here, we will look at results from an analysis of an experiment in which participants looked at either negative emotional images or neutral images (from the IAPS database). The statistics file we'll examine represents how much (and which) more activity voxels showed for negative compared to neutral images.

Throughout the rest of the course, we will learn which steps to take to, in the end, be able derive such a statistics map ourselves! But for now, we'll get a sneak peak of how such a file looks like.

Let's load the data first and inspect it's dimensions


In [ ]:
stat_data_path = 'stat_data.nii.gz' 
stat_data = nib.load(stat_data_path).get_data()

# first, let's take a look at the stats file
print(stat_data.shape)

Note that this file only has 3 dimensions (x,y,z), since it does not have a dimension of timepoints anymore. We simply have one value per voxel now, a z-stat value. This z-value reflects how much more activity that voxel displayed when the participant was watching negative images compared to when the participant was watching neutral images. Higher z-values thus reflect higher activity for negative than for neutral image.

To visualize the results, we'll look at the same axial slice as before:


In [ ]:
stat_slice = stat_data[:, :, axial_slice_no]

Usually, statistic files are plotted on top of an background image that shows the outlines of the brain so that you can see where the significant activity is located. Here, we'll use the functional file (func_data) averaged across the time-dimension as a background image. Then, we'll plot the statistics file (stat_slice) on top of the background file and color-code the values (more yellow = higher values) to see which voxels are (the most) active for negative compared to neutral images. Also, we have to define a threshold for which z-values (and thus corresponding voxels) are significant or not. Usually, a threshold of 2.3 for z-values is a sensible default (because it corresponds to a p-value of 0.01).


In [ ]:
THRESHOLD = 2.3

Now, let's do the actual plotting. You don't need to understand every line of code below, but you should understand that we're basically just plotting a background image and the (thresholded) statistics file on top of it!


In [ ]:
# first, let's plot the mean brain slice as 'background'
average_func_data = func_data[:, :, axial_slice_no, :].mean(axis=-1)

plt.figure(figsize=(10, 10))
plt.imshow(average_func_data, cmap = 'gray') 
plt.axis('off')

# the np.ma.masked_where function is not really important to understand!
# It basically makes the values below a certain threshold (here: 2.3) translucent in the brain image below
slice_masked = np.ma.masked_where(stat_slice < THRESHOLD, stat_slice)

# now we can plot our 'significant' voxels as an overlay
plt.imshow(slice_masked, cmap='hot', clim=[THRESHOLD, 6]) # try setting the THRESHOLD higher (e.g. THRESHOLD + 1) and see what happens!
plt.title('My first pretty brain plot!')
plt.show()

Congratulations! You've plotted a brain image!