In [ ]:
print "hello world"
Welcome to the first tutorial of the EFI workshop. The goal for today:
Python
Git
What is git?
Git as a historical tree
master
timeline -- a branch
branch
to diverge, make changes, and then merge it back in to master
Branch A
o--o--o--o----o-------....
/ \ Merge
Master o--o--o ----------------o-----....
\
o--o--o-----o----o-----....
Branch B
Who uses Git?
CERN, for example, maintains a GitLab instance for all CERN users and experiments including (but not limited to) LHC, ATLAS, and CMS.
GitHub | GitLab |
---|---|
There is a lot more to Git which will not be covered today, but if you have time to learn some more advanced concepts, try learning about Git branching!
What is Python?
I will include a very brief example of being able to compile an existing C/C++ libary to allow it to interface with Python
as if it was a Python
package!
Click this link and enjoy the interactive tutorial: https://try.github.io .
For the point of the tutorial today, we'll focus on Python 2.7 - however Python 3 does exist.
Why are there two Pythons?
Physicists in CERN rely primarily on python 2.7 as part of their analysis and development efforts. That doesn't mean you should use 2.7 or you will be forced to use it. Feel free to do the research and understand the differences, but realize that 2.X is on end-of-life releases while 3.X undergoes active development.
When programmers are learning a new language, we tend to write a one-line program that prints some version of the message "Hello world!" this is a simple program that shows whether your computer is properly set up to run Python programs. Let's try it out.
In [ ]:
print "Hello World!"
What just happened? A string defined in python "Hello World!"
is printed out to STDOUT
- your console, this Jupyter notebook, along with an automatically added newline character.
In [ ]:
type("Hello World!")
In [ ]:
timon = "Hakuna Matata!"
print timon
print "What a wonderful phrase..."
print timon
print "It ain't no passing craze..."
print "It means no worries...", "for the rest of your days"
and so clearly, you can already start understand how easy it is to print anything that has a representation. Variables can be printed, functions and classes can also be printed. But what if you wanted to print something on the same line? You can use print foo, bar
of course, but what about using two print
statements?
It might seem annoying sometimes that there's always a newline appended - but you can use a nifty shortcut by ending your print
statement with a comma (,
)
In [ ]:
print timon
print "What a wonderful phrase..."
print timon,
print "It ain't no passing craze..."
print "It means no worries...",
print "for the rest of your days"
To being the process of python coding, you should get introduced to your second statement: import
. This allows you to import modules and packages into the scope of your code and interface with other peoples sweat and tears. Let's start with the classic easter egg:
In [ ]:
import antigravity
Now, it's time for your first exercise. Try importing a module called this
In [ ]:
# go ahead and import it!
When you import a module, the name of that module goes into scope and its API is accessible under the same name. If the above two modules actually imported anything, you would be able to access it via antigravity.something
or this.that
. As a concrete example, let's import the types
library and see what sorts of things are included when importing it.
Note: not all modules print a message when they get imported. this
is a slight exception / easter egg.
In [ ]:
import types
dir(types)
In [ ]:
types.BooleanType
This means that I can do something like from types import BooleanType
to only extract a specific variable, class, or function instead! I can even rename it during import
if I have name-conflicts:
In [ ]:
import types as MyTypes
from types import BooleanType as AnotherBooleanType
However, you might have noticed that I say
and not print
function. There's a reason for this. I also introduced import
as the second statement you have learned so far... Here's why!
In Python, functions are defined
In [ ]:
# define a function named "what_does_the_fox_say"
def what_does_the_fox_say():
print "Ring-ding-ding-ding-dingeringeding"
and then executed
In [ ]:
what_does_the_fox_say()
and we get all sorts of remarkable properties on these functions, which you can view by running the dir
command
In [ ]:
dir(what_does_the_fox_say)
such as maybe printing out the name of the function
In [ ]:
what_does_the_fox_say.__name__
which seems a little redundant right now. But imagine you alias a function to another variable, but you want to do something based on the name of the function...
In [ ]:
# assign another variable that points to the function
wdtfs = what_does_the_fox_say
# execute the function
wdtfs()
# what is the name of the function?
wdtfs.__name__
As you can see, functions are objects in Python, otherwise known as first class citizens:
Notice how the function in Python has two parentheses wrapping the arguments (or the function signature). Is the print
statement a function?
In [ ]:
# try it out: print(....) with a single string or a single variable of your choice!
But this isn't quite right! In fact, the parentheses here, as we'll talking about, defines a tuple. This is somewhat deceiving but check it out:
In [ ]:
print(timon,timon)
so it seems to be printing out two strings in a tuple format... Is (timon, timon)
a tuple?
In [ ]:
type( (timon,timon) )
It appears to be! But how would I know if something is a function or not? Use the type
class!
In [ ]:
type(what_does_the_fox_say)
In [ ]:
type(print)
My personal opinion is that this sort of thing can start to get confusing having a very special statement in Python. For the rest of the notebook, we'll replace the print
statement with the print
function instead, which is a lot more powerful as we shall soon see.
Let's try and fix the error we've seen:
In [ ]:
# go ahead and import print_function from the __future__ module
type(print)
In this section, I hope to highlight some of the different ways that you should be able to input data into your python scripts, as well as being able to extract it out.
Lots of times, you might want to treat a python script as a single step in your processing chain. There could be a lot of variables, many places where something can go wrong. Rather than trying to create a one-stop-shop piece of code in Python that could do the full processing - it is much easier to build small snippets, process the data, output an intermediate result that gets fed into the next script down the chain.
For now, let's focus on the different ways to input data. Passing command line arguments will be handled in a later section since we'll focus on data first, not configuration. Editing a python script and re-running is a beginner (and slow) way to do configuration, but if it works, don't knock it!
In this section, we'll focus on the following ways to input files:
For more information, see python's tutorial on input/output.
In [ ]:
name = raw_input("What is your name?")
print("Greetings", name)
This is the most dangerous form as you will need to santize and validate the user input. Imagine that the person typing input to your script is a chimpanzee and you need to program your script to allow them to use it without breaking it.
Try running the above code with anything, or just press enter and type nothing in! You can get an idea of how to break a script if you're not careful about validating all crazy monkey inputs.
But this alone can let you create some really clever or useful helper programs. I like to think of raw_input
as a way to make my python script become a customized calculator for very specific calculations I do repetitively. This way, my script asks me for input, I give it the input, and let it calculate something for me.
As an example of this idea, I'll demonstrate how you could create an adder with an infinite while
loop.
In [ ]:
# start with nothing
total = 0
# loop infinitely
while True:
number = raw_input("Gimme a number or press enter to continue: ")
if number.isdigit():
# as long as we're given something that
# looks like a number... add it.
total += int(number)
else:
# break the loop
break
# let's tell them what they've won!
print("Your total is", total)
Next up on input is dealing with plain text files. I would like to focus a little bit on the different ways you can read a text file. In the data
directory are three files we will look at in this section. I will provide an explanation of how you might read them, and leave another as an exercise up to you. Feel free to play with these files.
In [ ]:
! ls -lavh plain_data/*
The first file is animals
which is a (non-exhaustive) list of animals, one animal per line. We would like to read this file into a Python list. Don't worry about what a Python list is (if you're unfamiliar) but it is like an array. The goal here is just to load the data file in.
I will demonstrate some different ways to do it using open()
a built-in function. This returns a file handler (or a file pointer). This means that you will need to make sure you manually close this file yourself when you are done with it, or you will leak a little bit of memory which is usually caught by the python garbage collection.
In [ ]:
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))
# do something in between
# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))
If you keep re-running the above code, notice that the file pointer changes. You really are opening this file, and then closing it. Obviously, we would like to read the contents of the file. There's a really handy intuitive method called read()
that lets us do that!
To read a file’s contents, call f.read(size)
, which reads some quantity of data and returns it as a string. size
is an optional numeric argument. When size
is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size
bytes are read and returned. If the end of the file has been reached, f.read()
will return an empty string (""
).
Let's try it out.
In [ ]:
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))
# read the first 128 bytes from the file
# leave this blank (fh.read()) to read it all
# but there are a lot of animals, so let's not read it all
data = fh.read(128)
print(data, "\n")
print(repr(data))
print("Data type: ", type(data))
# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))
Notice that Python's print
function will print the newline characters \n
by default. You can force it not to do this by wrapping the text in repr()
which prints the representation of the text.
This is a single string object. That's not entirely useful for our purposes, is it? We want to read this file, line by line. Perhaps we can read the entire file, then split it up by the newline delimiter -- but maybe we're limited by application memory. As it turns out, there is a readline()
command.
f.readline()
reads a single line from the file; a newline character (\n
) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline()
returns an empty string, the end of the file has been reached, while a blank line is represented by '\n'
, a string containing only a single newline.
In [ ]:
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))
data = fh.readline()
print(repr(data))
data = fh.readline()
print(repr(data))
# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))
and so now, we can read this file, line by line. But maybe you want all the lines. There's also a nifty readlines()
command:
In [ ]:
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))
data = fh.readlines()
# let's not print out all the animals
print("There are", len(data), "animals in the file.")
# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))
Another, somewhat less undocumented way, is to use list()
to cast the file pointer into a list object. In Python, this is a special case that automatically gives you a list.
In [ ]:
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))
data = list(fh)
# let's not print out all the animals
print("There are", len(data), "animals in the file.")
# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))
Just beware that each element in this list will contain that pesky newline character at the end which is great for printing it to the screen... but not so much when you're trying to do data analysis. The strip()
commands (also lstrip()
and rstrip()
) on str
objects will remove the specified characters from the beginning/end of the string. If you don't specify a character to truncate, whitespace characters will be truncated by default.
In [ ]:
print("left", " outer space ", "right")
print("left", " outer space ".strip(), "right")
Feel free to go back over some of the above examples and try using line.strip()
instead! When reading plain text files, strip()
and repr()
are your friends.
As you can imagine so far, it can start getting annoying to constanly remember to close the file when you're done. In fact, there's a statement for it! The with
statement will call the file.__enter__()
and file.__exit__()
methods automatically. Luckily for us, this means we can open files for reading and then immediately close them after when we are done. Here's a nice example:
In [ ]:
with open("plain_data/animals", "r") as fh:
print(fh, type(fh))
print(fh, type(fh))
So now, we can use the scope of the with
statement to encapsulate the logic we need for reading files in this example. This is incredibly convenient. Let's try to do the same thing with a comma-separated file!
In [ ]:
! ls -lavh plain_data/*
The other two files here to focus on are comma-separated and tab-separated. These are plain text files as well. Like the animals
file we just looked at where there was an animal for each line -- the other files have an entry, or event, or item for each line. The difference is that now, we have a delimiter that encodes more information per line that we need to extract in a meaningful way.
Using the csv
library provided, let's try reading in the comma-separated file!
Let's look at what plain_data/example.csv
contains so we know how to configure the reader:
In [ ]:
! cat plain_data/example.csv
As you can see, we want to make sure that the comma inside the quoted string, like "Venture, Extended Edition"
isn't considered a delimiter. We will use "
as a quote character which is a way to let the csv reader flag when to split on the delimiter and when to not.
In [ ]:
import csv
with open("plain_data/example.csv") as fh:
reader = csv.reader(fh, delimiter=',', quotechar='"')
# use next() to get the first line/entry/row/event from the file, which are the headers
headers = reader.next()
# use the list(fh) trick to just get a list of everything else!
data = list(reader)
print("Headers:", repr(headers))
print("Data:", repr(data))
As you can see, it was remarkably easy to extract out a comma-separated file correctly. What you see above is the list representation for the headers -- a list of 4 items. We called the next()
on the reader (which acts like a file pointer for us).
The data is represented as a list of lists. The important part here is that we were able to split up the file and extract out the data in a meaningful way. That's the goal of this section, to input data. We haven't discussed understanding it, or parsing it, or validating it, or doing calculations with it. This is where the crux of your time will be spent - analyzing and calculating. I hope these pieces of the code become less challenging and easier so you can focus on what really matters.
Now for the exercise! There is one more file in here, a tab-separated file with the same data. Using the above code as inspiration, copied below for you -- can you produce the same print
output as in the comma-separated case? Make sure you understan what each piece of code is doing. Don't be shy about bedazzling this code with print
functions everywhere!
In [ ]:
import csv
with open("plain_data/example.tsv") as fh:
reader = csv.reader(fh, delimiter=',', quotechar='"') # use the tab delimiter: \t
# do some stuff here
headers = ''
data = ''
print("Headers:", repr(headers))
print("Data:", repr(data))
JSON is one of the most useful plain text formats you can get. Most languages have a JSON parser as you can see from the list at the bottom of this page on JSON.org. It also happens to be a very nice way to provide configuration for any script as well as a way to serialize data to pass between scripts in different languages that also happens to be human-readable. A more advanced version of JSON is known as protobuf but I won't cover that.
Let's just show a very simple example of reading a JSON file in json_data/
.
In [ ]:
! ls -lavh json_data/*
In [ ]:
! cat json_data/example.json
As you can see, the JSON data looks perfectly readable. With Python, we just use the JSON library in order to load this up.
In [ ]:
import json
json.load("json_data/example.json")
Uh-oh! What happened here? It looks like the python call was expecting a file pointer instead. Let's fix that.
In [ ]:
import json
data = json.load(open("json_data/example.json"))
print(repr(data))
And that's it! There is really not that much more to do here. You've just loaded in a file containing JSON data and you can now do things with it.
As you probably can already guess - JSON is not the best format when it comes to storing a lot of raw data in very complicated layouts. In fact, when you start having so much data that it costs a lot of money to maintain the storage for the data - you care a lot about being able to compress it while still maintaining as fast access as possible. This is where data formats like ROOT and HDF become important.
These are file formats that are usually bottlenecked by I/O (input/output) but still allows for very fast access of a large amount of data that cannot fit into physical memory. At this point, you start getting into parallel processing, and thinking about distributed computing, and a lot of interesting technologies come into play. For now, we'll just briefly show two different ways of opening a ROOT file. There will be more of a focus on HDF5 later during the workshop.
In [ ]:
import ROOT
ROOT has a python library known as PyROOT which we will take advantage of. There is also a NumPy wrapper around PyROOT which is known as root_numpy which contains some test data for the purposes of demonstrating that we can read in a ROOT file.
In [ ]:
import ROOT
from root_numpy import testdata
print(testdata.get_filepath('single1.root'))
# open for reading
f = ROOT.TFile.Open(testdata.get_filepath('single1.root'))
print(f, type(f))
# close the file pointer
f.Close()
print(f, type(f))
Like we've seen with open()
before, PyROOT will create the file pointer from you given a filename. You do not need to pass in a file object. In this case, ROOT.TFile.Open()
replaces open()
when we need to open a file pointer for ROOT files.
With HDF5, there is a similar pattern. When you start looking into the machine learning tutorials later - you will be able to open and read hdf5
files like so
import h5py
#open for reading
f = h5py.File('example.hdf5, 'r')
print(f, type(f))
f.close()
print(f, type(f))
# Get the data
so like ROOT, you give the H5Py library the name of the file to open, and it returns a pointer. For JSON, and other plain text files -- you use open()
to create the file pointer.
Recall that in The Future of Hello World, we used the print()
function from Python 3 by importing from __future__
. We also saw that we can use the end
attribute for the function to remove the newline that would get added when printing to screen. What if you wanted to print directly to a file? Well, you can!
If you are using the print
statement, you could do
with open('out.txt', 'w') as f:
print >>f, 'Hello World!'
but since we're wise beyond our years and use the print function... You can use the file
attribute.
But before we get to the next exercise, I would like to remind you that you can open a file for reading r
, writing w
, or appending a
-- as well as in binary mode by adding the b
flag. In order to write to a file, you must use w
, wb
, a
, or ab
at the very least for the open
file-mode call.
So now, the exercise! Fix the below code to make it work using the print(..., file=<file pointer>)
function to redirect your output to hello_world.log
In [ ]:
from __future__ import print_function
with open('hello_world.log', 'w') as fh:
print >>fh, "Hello World!" # rewrite this line
The other thing you could do is to have STDOUT
point to a file instead. Then all print statements get redirected for you. How do you access your system's information about which file the kernel writes into? Why, using the python sys
library!
In [ ]:
import sys
# let's just show we can write to the notebook as usual
print(sys.stdout)
# hold on to the regular stdout
temp_stdout = sys.stdout
# now redirect to a file
sys.stdout = open("stdout.log", "w")
print(sys.stdout)
# now redirect back
sys.stdout.close()
sys.stdout = temp_stdout
# this should print again to the notebook
print(sys.stdout)
and we can see that running the above will make a stdout.log
file containing
In [ ]:
! cat stdout.log
So this is pretty neat. This should give you an idea of how you have some control over being able to stream output on your system. You can start to do more complicated things such as redirecting output to multiple files based on what function is being called or something more clever. To learn more about some advanced features, such as using the python logging
facility, you can check out the advanced section in this tutorial.
Let's go ahead and delete the files we just created in this tutorial
In [ ]:
! rm hello_world.log stdout.log
We've just seen how you can redirect print
output to a file in two different ways:
print
statements/functions on a case-by-case basisprint
calls by changing where sys.stdout
points toTo write output directly to a file using a file pointer is just as easy as reading it. Instead of using the file::read()
command, you will use the file::write()
command. Unlike print
calls which add a newline character to the end for you, you need to add one manually in order to make newlines in a file. Time for the next exercise!
In [ ]:
name = raw_input("What is your name? ")
with open('plain_output.log', 'w') as fh:
# rewrite these lines to use fh.write(....) instead
print("Hello ", file=fh, end='') #no newline
print(name, file=fh) #newline
print("How are you today?", file=fh) #newline
I would like you to play around with the fh.write()
by adding multiple lines in, and try to understand how this works. This above example combines a few pieces of what you've learned previously using raw_input
as well as how to suppress a newline with a print
call. You can check your work by running
In [ ]:
! cat plain_output.log
When you're satisfied that you understand plain text printing, go ahead and clean up the file you made
In [ ]:
! rm plain_output.log
JSON is a very nice way to serialize many python objects into a plain text format that can be re-imported at a later time. It's as simple as loading the JSON library and calling the dumps()
method to see the JSON representation, or calling the dump(data, fh)
method to write to a file pointer.
Imagine a scenario where you have lots of plain text files, but you want to convert them to JSON. In this case, maybe we'll just load one of the plain data examples where we loaded the data into a list, and then just dump that list into a file in JSON format. Starting from a previous exercise, let's extend it more...
I've added the import json
and json.dumps()
calls for you. All you need to do is create a with
statement that gives you an open file handle to write into using json.dump()
. You can see a pretty JSON representation of the object we will write to the file. I use the nifty indent=4
to make it look pretty. The object we're dumping is a tuple (a, b)
of two different lists: headers and data.
In [ ]:
import csv, json
with open("plain_data/example.csv") as fh:
reader = csv.reader(fh, delimiter=',', quotechar='"')
# use next() to get the first line/entry/row/event from the file, which are the headers
headers = reader.next()
# use the list(fh) trick to just get a list of everything else!
data = list(reader)
# I want to write this to a file called: example.json
# using the with statement to pass in a file pointer
print(json.dumps((headers, data), indent=4))
When all is said and done, you should verify that you wrote to the file correctly
In [ ]:
! cat example.json
Compare this to plain_data/example.csv
to get an idea of how the two representations are a bit different, but both are still plain text!
In [ ]:
! cat plain_data/example.csv
and then clean up and remove the JSON file you just made.
In [ ]:
! rm example.json
In this section, I will introduce you to the next powerful concept of Python which are iterables. An iterable, as its name implies, is something you can iterate over in some sort of a loop: for
, do/while
, and so on.
From the python glossary, an iterable is:
Some of the sequences we will cover include:
For more information, see this tutorial. See also iterator, sequence, and generator.
In [ ]:
# a list of stuff
a_list = [1,2,'three',2*2,"five","6", [1,2,3]]
print(type(a_list), a_list)
In [ ]:
# sets contain unique objects
a_set = {1,'two',3,3,1,1,1,1,1,1}
print(type(a_set), a_set)
In [ ]:
# dictionaries are key-value stores
a_dict = {"name": "Giordon Stark", "age": 27, "favorite_number": 3.1415926, ("dead","alive"): False}
print(type(a_dict), a_dict)
In [ ]:
# tuples are fixed and cannot be changed after you define it
a_tuple = ("Enrico","Fermi","Institute")
print(type(a_tuple), a_tuple)
In [ ]:
# you know what a string is
a_string = "Hello World!"
print(type(a_string), a_string)
In [ ]:
# and what a file is
a_file = open("plain_data/example.tsv", "r")
print(type(a_file), a_file)
a_file.close()
So let's show you how to iterate over each of these types of objects.
In [ ]:
for item in a_list:
print(type(item), item, repr(item))
In [ ]:
for item in a_set:
print(type(item), item, repr(item))
In [ ]:
for k, v in a_dict.iteritems():
print(type(k), k, type(v), v)
but as you've noticed, dictionaries are a little special since they contain key/value pairs. How do you print the value of the dictionary? You can iterate over the keys of a dictionary and then access the values from a dictionary using either it's __getitem__()
method with []
accessor, or using dict::get()
which also allows you to set a default value... or just use dict.iteritems()
to loop over (key, value)
tuples that represent the dictionary instead.
In [ ]:
a_dict.items()
In [ ]:
for item in a_dict:
print(type(item), item, a_dict[item])
In [ ]:
for item in a_dict:
print(type(item), item, a_dict.get(item))
print(a_dict.get('FakeKey'))
print(a_dict.get('FakeKey', 'DefaultValue'))
In [ ]:
for item in a_tuple:
print(type(item), item, repr(item))
In [ ]:
for item in a_string:
print(type(item), item, repr(item))
In [ ]:
""" Remember that below code is equivalent to:
a_file = open("plain_data/example.tsv")
for item in a_file:
print(type(item), item, repr(item))
a_file.close()
"""
with open("plain_data/example.tsv") as f:
for item in f:
print(type(item), repr(item))
Notice that with iterating over a file pointer is like iterating over the readlines
method we've seen previously in this tutorial.
In [ ]:
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9])
print(type(arr), repr(arr))
for item in arr:
print(type(item), repr(item))
NumPy arrays are often just called ndarray
which means "n-dimensional array". In fact, these arrays can have arbitrary dimension and are used heavily with data science in Python. In the next tutorial, I'll show some interesting features of NumPy arrays so that you can be more familiar with how to handle them.
To close off this tutorial into Python, let's look at some common list operations and manipulations you should be familiar with.
Operation | Result |
---|---|
range(j) |
Sequence of integers from [0,j) in steps of size 1 |
range(i,j) |
Sequence of integers from [i, j) in steps of size 1 |
range(i, j, k) |
Sequence of integers from [i, j) in steps of size k |
x in s |
True if an item of s is equal to x , else False |
x not in s |
False if an item of s is equal to x , else True |
s + t |
The concatenation of s and t , extending a sequence |
s * n or n * s |
equivalent to adding s to itself n times |
s[i] |
ith item of s , starting from 0 |
s[i:j] |
slice of s from i to j |
s[i:j:k] |
slice of s from i to j with step k |
len(s) |
length of s |
min(s) |
smallest item of s |
max(s) |
largest item of s |
s.index(x[, i[, j]]) |
index of the first occurence of x in s (at or after index i and before index j ) |
s.count(x) |
total number of occurrences of x in s |
s.sort() |
sorting s in place |
sorted(s) |
return a sorted copy of s |
map(f, s) |
return a copy of s with f applied to every element |
For more information, see python documentation. Let's see examples of the above in action.
In [ ]:
print(range(10))
print(range(2,9))
print(range(2,9,3))
In [ ]:
print(3 in range(10))
print(3 in range(2,9))
print(3 in range(2,9,3))
In [ ]:
range(3)+range(11,20,2)
In [ ]:
range(3)*3
In [ ]:
print(range(10)[2])
print(range(10)[2:6])
print(range(10)[2:6:3])
In [ ]:
print(len(range(10)))
print(min(range(10)))
print(max(range(10)))
In [ ]:
range(10).index(2)
In [ ]:
import math
map(math.sqrt, range(10))
Comprehensions are one of the most amazing things about Python. What better way then to show off some of the power?
In [ ]:
[i for i in range(10)]
How do you read this? "Make a list of i
, for i
in range(10)
and return it to me". Why is this powerful? If you think about it, you could replace map()
with list comprehension! Let's use that last example of map()
and rewrite it using list comprehension.
In [ ]:
import math
map(math.sqrt, range(10)) #rewrite me!!!
You can even combine multiple loops, such as generating all possible combinations of two different lists. Let's take an example of combining a letter from my first name with a letter from my last name and generating all possible unique combinations:
In [ ]:
result = []
for first in "giordon":
for last in "stark":
result.append(first+last)
set(result)
How can we make this shorter, and perhaps more clearer, with comprehensions?
In [ ]:
set([first+last for last in "stark" for first in "giordon"])
Slicing and filtering are the two cornerstones of data analysis. You need to constantly be able to clean/sanitize your data, as well as slice it based on conditions. For now, let's discuss how we slice/filter vanilla Python sequences. In the next tutorial focusing on NumPy arrays, I'll show you some more interesting examples.
In [ ]:
reverse_me = range(10)
reverse_me.reverse()
print(reverse_me)
Can you think of a way to reverse this list using slicing? As a reminder, here's an excerpt from the table above:
Operation | Result |
---|---|
s[i:j:k] |
slice of s from i to j with step k |
In [ ]:
# rewrite me into one line!
reverse_me = range(10)
reverse_me.reverse()
print(reverse_me)
Why do I care about this? The problem is that the reverse()
function only exists on lists, but not all sequences. So you might want to do "giordon".reverse()
only to find out that string objects do not have this method! That's unfortunate.
Once you've figured out the above exercise, try it with a string to see if your solution is able to work on strings too.
In [ ]:
# rewrite me using your solution!
''.join(reversed("giordon"))
Filtering becomes much easier in Python once you realize you can take advantage of comprehensions. Let's look at the example we had from above for combining the letters from my first and last names. Let's say I only wanted to include combinations that had at least one vowel - how would I do this?
set([first+last for last in "stark" for first in "giordon"])
In [ ]:
set([first+last for last in "stark" for first in "giordon" if first in "aeiou" or last in "aeiou"])
and again, in English, you would say "give me first
plus last
for last
in 'stark', for first
in 'giordon', only if first
is in 'aeiou' or last
is in 'aeiou'".
In [ ]:
for first,middle,last in zip("giordon","holtsberg","stark"):
print(first,middle,last)
You can think of this as a way to transpose a series of lists, but it will only iterate as far as the shortest list can go. In the above case, the shortest list was "stark"
with a length of 5. You can try adding spaces to get something that might look aesthetically better:
In [ ]:
for first,middle,last in zip(" giordon ","holtsberg"," stark "):
print(first,middle,last)
But an incredibly neat trick, and you will see how to transpose NumPy arrays soon, is when you need to effectively transpose a list of lists.
In [ ]:
list_of_lists = [range(10)]*5
list_of_lists
In [ ]:
zip(*list_of_lists) #splat!
In Python, *
has many meanings depending on the context it is found in. Here, we use it as the splat
or unpack
operator, which literally unpacks our lists for us. What this means is the following:
a_list = [a, b, c, d, e]
some_func(a_list) #equivalent to: some_func([a, b, c, d, e])
which calls some_func
with one argument - a list. What if you wanted to pass the list items as arguments too? Unpack the list!
some_func(*a_list) #equivalent to: some_func(a, b, c, d, e)
which calls some_func
with five arguments.