In [ ]:

    
print "hello world"

Introduction to Git and Python
- Overview of Git
- Overview of Python
Getting Started with Git
The Flying Circus (Getting Started with Python)
- Statements
- Input/Output
  - Input
    - Raw
    - Plain Text
    - JSON
    - ROOT and HDF5
  - Output
- Iterating Iterables
  - Iterables
    - List
    - Set
    - Dictionary
    - Tuple
    - String
    - File
    - NumPy Array
  - Common Sequence Operations

[top] Introduction to Git and Python

Welcome to the first tutorial of the EFI workshop. The goal for today:

familiarity with Python
familiarity with Git
understand common Python concepts
gain foundation to succeed in the rest of the workshop

[top] Overview of Git

What is git?

distributed revision control system
blazing fast
free and open-sourced software

Git as a historical tree

All changes start on the master timeline -- a branch
Can create a new branch to diverge, make changes, and then merge it back in to master
Always have a mental image of a tree like the below diagram

                  Branch A
               o--o--o--o----o-------....
              /               \ Merge
Master o--o--o ----------------o-----....
             \
              o--o--o-----o----o-----....
                  Branch B

Who uses Git?

GitHub -- online community that is free for anyone to create and share public projects and repositories
GitLab -- online, open-sourced software that you can install on a server, maintain it yourself, and have your collaborators contribute.

CERN, for example, maintains a GitLab instance for all CERN users and experiments including (but not limited to) LHC, ATLAS, and CMS.

GitHub	GitLab

There is a lot more to Git which will not be covered today, but if you have time to learn some more advanced concepts, try learning about Git branching!

[top] Overview of Python

What is Python?

a real [high-level] programming language
simple to use, easy to learn
extensible and can execute compiled C-code directly

I will include a very brief example of being able to compile an existing C/C++ libary to allow it to interface with Python as if it was a Python package!

[top] Getting Started with Git

Click this link and enjoy the interactive tutorial: https://try.github.io .

[top] The Flying Circus (Getting started with Python)

[top] Python Futures

For the point of the tutorial today, we'll focus on Python 2.7 - however Python 3 does exist.

Why are there two Pythons?

Python 3 is a newer version of Python.
Many people stil use Python 2.

Physicists in CERN rely primarily on python 2.7 as part of their analysis and development efforts. That doesn't mean you should use 2.7 or you will be forced to use it. Feel free to do the research and understand the differences, but realize that 2.X is on end-of-life releases while 3.X undergoes active development.

[top] Hello World

When programmers are learning a new language, we tend to write a one-line program that prints some version of the message "Hello world!" this is a simple program that shows whether your computer is properly set up to run Python programs. Let's try it out.



In [ ]:

    
print "Hello World!"

What just happened? A string defined in python "Hello World!" is printed out to STDOUT - your console, this Jupyter notebook, along with an automatically added newline character.



In [ ]:

    
type("Hello World!")



In [ ]:

    
timon = "Hakuna Matata!"
print timon
print "What a wonderful phrase..."
print timon
print "It ain't no passing craze..."
print "It means no worries...", "for the rest of your days"

and so clearly, you can already start understand how easy it is to print anything that has a representation. Variables can be printed, functions and classes can also be printed. But what if you wanted to print something on the same line? You can use print foo, bar of course, but what about using two print statements?

It might seem annoying sometimes that there's always a newline appended - but you can use a nifty shortcut by ending your print statement with a comma (,)



In [ ]:

    
print timon
print "What a wonderful phrase..."
print timon,
print "It ain't no passing craze..."
print "It means no worries...",
print "for the rest of your days"

[top] Your Second Statement

To being the process of python coding, you should get introduced to your second statement: import. This allows you to import modules and packages into the scope of your code and interface with other peoples sweat and tears. Let's start with the classic easter egg:



In [ ]:

    
import antigravity

Now, it's time for your first exercise. Try importing a module called this



In [ ]:

    
# go ahead and import it!

When you import a module, the name of that module goes into scope and its API is accessible under the same name. If the above two modules actually imported anything, you would be able to access it via antigravity.something or this.that. As a concrete example, let's import the types library and see what sorts of things are included when importing it.

Note: not all modules print a message when they get imported. this is a slight exception / easter egg.



In [ ]:

    
import types
dir(types)



In [ ]:

    
types.BooleanType

This means that I can do something like from types import BooleanType to only extract a specific variable, class, or function instead! I can even rename it during import if I have name-conflicts:



In [ ]:

    
import types as MyTypes
from types import BooleanType as AnotherBooleanType

[top] Functions and Debugging

However, you might have noticed that I say

print statement

and not print function. There's a reason for this. I also introduced import as the second statement you have learned so far... Here's why!

In Python, functions are defined



In [ ]:

    
# define a function named "what_does_the_fox_say"
def what_does_the_fox_say():
    print "Ring-ding-ding-ding-dingeringeding"

and then executed



In [ ]:

    
what_does_the_fox_say()

and we get all sorts of remarkable properties on these functions, which you can view by running the dir command



In [ ]:

    
dir(what_does_the_fox_say)

such as maybe printing out the name of the function



In [ ]:

    
what_does_the_fox_say.__name__

which seems a little redundant right now. But imagine you alias a function to another variable, but you want to do something based on the name of the function...



In [ ]:

    
# assign another variable that points to the function
wdtfs = what_does_the_fox_say
# execute the function
wdtfs()
# what is the name of the function?
wdtfs.__name__

As you can see, functions are objects in Python, otherwise known as first class citizens:

passed around and manipulated similarly to integers and strings
assign a function to a variable
pass a function as an argument to another function, etc...

Notice how the function in Python has two parentheses wrapping the arguments (or the function signature). Is the print statement a function?



In [ ]:

    
# try it out: print(....) with a single string or a single variable of your choice!

But this isn't quite right! In fact, the parentheses here, as we'll talking about, defines a tuple. This is somewhat deceiving but check it out:



In [ ]:

    
print(timon,timon)

so it seems to be printing out two strings in a tuple format... Is (timon, timon) a tuple?



In [ ]:

    
type( (timon,timon) )

It appears to be! But how would I know if something is a function or not? Use the type class!



In [ ]:

    
type(what_does_the_fox_say)



In [ ]:

    
type(print)

My personal opinion is that this sort of thing can start to get confusing having a very special statement in Python. For the rest of the notebook, we'll replace the print statement with the print function instead, which is a lot more powerful as we shall soon see.

Let's try and fix the error we've seen:



In [ ]:

    
# go ahead and import print_function from the __future__ module

type(print)

[top] Input/Output

In this section, I hope to highlight some of the different ways that you should be able to input data into your python scripts, as well as being able to extract it out.

Lots of times, you might want to treat a python script as a single step in your processing chain. There could be a lot of variables, many places where something can go wrong. Rather than trying to create a one-stop-shop piece of code in Python that could do the full processing - it is much easier to build small snippets, process the data, output an intermediate result that gets fed into the next script down the chain.

For now, let's focus on the different ways to input data. Passing command line arguments will be handled in a later section since we'll focus on data first, not configuration. Editing a python script and re-running is a beginner (and slow) way to do configuration, but if it works, don't knock it!

[top] Input

In this section, we'll focus on the following ways to input files:

raw input / user prompt
plain and delimited text files (comma, tabs/spaces, etc...)
JSON (JavaScript Object Notation)
ROOT data files
HDF5 data files

For more information, see python's tutorial on input/output.

[top] Raw Input

Raw input is what I call prompting the user for input. This is done using the raw_input built-in function. This blocks your script from continuing until the user has put some input in.



In [ ]:

    
name = raw_input("What is your name?")
print("Greetings", name)

This is the most dangerous form as you will need to santize and validate the user input. Imagine that the person typing input to your script is a chimpanzee and you need to program your script to allow them to use it without breaking it.

Try running the above code with anything, or just press enter and type nothing in! You can get an idea of how to break a script if you're not careful about validating all crazy monkey inputs.

But this alone can let you create some really clever or useful helper programs. I like to think of raw_input as a way to make my python script become a customized calculator for very specific calculations I do repetitively. This way, my script asks me for input, I give it the input, and let it calculate something for me.

As an example of this idea, I'll demonstrate how you could create an adder with an infinite while loop.



In [ ]:

    
# start with nothing
total = 0
# loop infinitely
while True:
    number = raw_input("Gimme a number or press enter to continue: ")
    if number.isdigit():
        # as long as we're given something that
        #  looks like a number... add it.
        total += int(number)
    else:
        # break the loop
        break
# let's tell them what they've won!
print("Your total is", total)

[top] Plain Text

Next up on input is dealing with plain text files. I would like to focus a little bit on the different ways you can read a text file. In the data directory are three files we will look at in this section. I will provide an explanation of how you might read them, and leave another as an exercise up to you. Feel free to play with these files.



In [ ]:

    
! ls -lavh plain_data/*

The first file is animals which is a (non-exhaustive) list of animals, one animal per line. We would like to read this file into a Python list. Don't worry about what a Python list is (if you're unfamiliar) but it is like an array. The goal here is just to load the data file in.

I will demonstrate some different ways to do it using open() a built-in function. This returns a file handler (or a file pointer). This means that you will need to make sure you manually close this file yourself when you are done with it, or you will leak a little bit of memory which is usually caught by the python garbage collection.



In [ ]:

    
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))

# do something in between

# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))

If you keep re-running the above code, notice that the file pointer changes. You really are opening this file, and then closing it. Obviously, we would like to read the contents of the file. There's a really handy intuitive method called read() that lets us do that!

To read a file’s contents, call f.read(size), which reads some quantity of data and returns it as a string. size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size bytes are read and returned. If the end of the file has been reached, f.read() will return an empty string ("").

Let's try it out.



In [ ]:

    
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))

# read the first 128 bytes from the file
#    leave this blank (fh.read()) to read it all
#    but there are a lot of animals, so let's not read it all
data = fh.read(128)
print(data, "\n")
print(repr(data))
print("Data type: ", type(data))

# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))

Notice that Python's print function will print the newline characters \n by default. You can force it not to do this by wrapping the text in repr() which prints the representation of the text.

This is a single string object. That's not entirely useful for our purposes, is it? We want to read this file, line by line. Perhaps we can read the entire file, then split it up by the newline delimiter -- but maybe we're limited by application memory. As it turns out, there is a readline() command.

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.



In [ ]:

    
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))

data = fh.readline()
print(repr(data))

data = fh.readline()
print(repr(data))

# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))

and so now, we can read this file, line by line. But maybe you want all the lines. There's also a nifty readlines() command:



In [ ]:

    
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))

data = fh.readlines()
# let's not print out all the animals
print("There are", len(data), "animals in the file.")

# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))

Another, somewhat less undocumented way, is to use list() to cast the file pointer into a list object. In Python, this is a special case that automatically gives you a list.



In [ ]:

    
# r means open this file in read-only mode
fh = open("plain_data/animals", "r")
print(fh, type(fh))

data = list(fh)
# let's not print out all the animals
print("There are", len(data), "animals in the file.")

# always make sure to close the file when you are done with it
fh.close()
print(fh, type(fh))

Just beware that each element in this list will contain that pesky newline character at the end which is great for printing it to the screen... but not so much when you're trying to do data analysis. The strip() commands (also lstrip() and rstrip()) on str objects will remove the specified characters from the beginning/end of the string. If you don't specify a character to truncate, whitespace characters will be truncated by default.



In [ ]:

    
print("left", "     outer space      ", "right")
print("left", "     outer space      ".strip(), "right")

Feel free to go back over some of the above examples and try using line.strip() instead! When reading plain text files, strip() and repr() are your friends.

As you can imagine so far, it can start getting annoying to constanly remember to close the file when you're done. In fact, there's a statement for it! The with statement will call the file.__enter__() and file.__exit__() methods automatically. Luckily for us, this means we can open files for reading and then immediately close them after when we are done. Here's a nice example:



In [ ]:

    
with open("plain_data/animals", "r") as fh:
    print(fh, type(fh))

print(fh, type(fh))

So now, we can use the scope of the with statement to encapsulate the logic we need for reading files in this example. This is incredibly convenient. Let's try to do the same thing with a comma-separated file!



In [ ]:

    
! ls -lavh plain_data/*

The other two files here to focus on are comma-separated and tab-separated. These are plain text files as well. Like the animals file we just looked at where there was an animal for each line -- the other files have an entry, or event, or item for each line. The difference is that now, we have a delimiter that encodes more information per line that we need to extract in a meaningful way.

Using the csv library provided, let's try reading in the comma-separated file!

Let's look at what plain_data/example.csv contains so we know how to configure the reader:



In [ ]:

    
! cat plain_data/example.csv

As you can see, we want to make sure that the comma inside the quoted string, like "Venture, Extended Edition" isn't considered a delimiter. We will use " as a quote character which is a way to let the csv reader flag when to split on the delimiter and when to not.



In [ ]:

    
import csv
with open("plain_data/example.csv") as fh:
    reader = csv.reader(fh, delimiter=',', quotechar='"')
    # use next() to get the first line/entry/row/event from the file, which are the headers
    headers = reader.next()
    # use the list(fh) trick to just get a list of everything else!
    data = list(reader)
    
print("Headers:", repr(headers))
print("Data:", repr(data))

As you can see, it was remarkably easy to extract out a comma-separated file correctly. What you see above is the list representation for the headers -- a list of 4 items. We called the next() on the reader (which acts like a file pointer for us).

The data is represented as a list of lists. The important part here is that we were able to split up the file and extract out the data in a meaningful way. That's the goal of this section, to input data. We haven't discussed understanding it, or parsing it, or validating it, or doing calculations with it. This is where the crux of your time will be spent - analyzing and calculating. I hope these pieces of the code become less challenging and easier so you can focus on what really matters.

Now for the exercise! There is one more file in here, a tab-separated file with the same data. Using the above code as inspiration, copied below for you -- can you produce the same print output as in the comma-separated case? Make sure you understan what each piece of code is doing. Don't be shy about bedazzling this code with print functions everywhere!



In [ ]:

    
import csv
with open("plain_data/example.tsv") as fh:
    reader = csv.reader(fh, delimiter=',', quotechar='"') # use the tab delimiter: \t
    # do some stuff here
    headers = ''
    data = ''
    
print("Headers:", repr(headers))
print("Data:", repr(data))

[top] JSON

JSON is one of the most useful plain text formats you can get. Most languages have a JSON parser as you can see from the list at the bottom of this page on JSON.org. It also happens to be a very nice way to provide configuration for any script as well as a way to serialize data to pass between scripts in different languages that also happens to be human-readable. A more advanced version of JSON is known as protobuf but I won't cover that.

Let's just show a very simple example of reading a JSON file in json_data/.



In [ ]:

    
! ls -lavh json_data/*



In [ ]:

    
! cat json_data/example.json

As you can see, the JSON data looks perfectly readable. With Python, we just use the JSON library in order to load this up.



In [ ]:

    
import json
json.load("json_data/example.json")

Uh-oh! What happened here? It looks like the python call was expecting a file pointer instead. Let's fix that.



In [ ]:

    
import json
data = json.load(open("json_data/example.json"))
print(repr(data))

And that's it! There is really not that much more to do here. You've just loaded in a file containing JSON data and you can now do things with it.

[top] ROOT and HDF5

As you probably can already guess - JSON is not the best format when it comes to storing a lot of raw data in very complicated layouts. In fact, when you start having so much data that it costs a lot of money to maintain the storage for the data - you care a lot about being able to compress it while still maintaining as fast access as possible. This is where data formats like ROOT and HDF become important.

These are file formats that are usually bottlenecked by I/O (input/output) but still allows for very fast access of a large amount of data that cannot fit into physical memory. At this point, you start getting into parallel processing, and thinking about distributed computing, and a lot of interesting technologies come into play. For now, we'll just briefly show two different ways of opening a ROOT file. There will be more of a focus on HDF5 later during the workshop.



In [ ]:

    
import ROOT

ROOT has a python library known as PyROOT which we will take advantage of. There is also a NumPy wrapper around PyROOT which is known as root_numpy which contains some test data for the purposes of demonstrating that we can read in a ROOT file.



In [ ]:

    
import ROOT
from root_numpy import testdata
print(testdata.get_filepath('single1.root'))
# open for reading
f = ROOT.TFile.Open(testdata.get_filepath('single1.root'))
print(f, type(f))

# close the file pointer
f.Close()
print(f, type(f))

Like we've seen with open() before, PyROOT will create the file pointer from you given a filename. You do not need to pass in a file object. In this case, ROOT.TFile.Open() replaces open() when we need to open a file pointer for ROOT files.

With HDF5, there is a similar pattern. When you start looking into the machine learning tutorials later - you will be able to open and read hdf5 files like so

import h5py
#open for reading
f = h5py.File('example.hdf5, 'r')
print(f, type(f))

f.close()
print(f, type(f))
# Get the data

so like ROOT, you give the H5Py library the name of the file to open, and it returns a pointer. For JSON, and other plain text files -- you use open() to create the file pointer.

[top] Output

[top] Saying Hello World to a file

Recall that in The Future of Hello World, we used the print() function from Python 3 by importing from __future__. We also saw that we can use the end attribute for the function to remove the newline that would get added when printing to screen. What if you wanted to print directly to a file? Well, you can!

If you are using the print statement, you could do

with open('out.txt', 'w') as f:
  print >>f, 'Hello World!'

but since we're wise beyond our years and use the print function... You can use the file attribute.

But before we get to the next exercise, I would like to remind you that you can open a file for reading r, writing w, or appending a -- as well as in binary mode by adding the b flag. In order to write to a file, you must use w, wb, a, or ab at the very least for the open file-mode call.

So now, the exercise! Fix the below code to make it work using the print(..., file=<file pointer>) function to redirect your output to hello_world.log



In [ ]:

    
from __future__ import print_function
with open('hello_world.log', 'w') as fh:
    print >>fh, "Hello World!" # rewrite this line

The other thing you could do is to have STDOUT point to a file instead. Then all print statements get redirected for you. How do you access your system's information about which file the kernel writes into? Why, using the python sys library!



In [ ]:

    
import sys
# let's just show we can write to the notebook as usual
print(sys.stdout)
# hold on to the regular stdout 
temp_stdout = sys.stdout
# now redirect to a file
sys.stdout = open("stdout.log", "w")
print(sys.stdout)
# now redirect back
sys.stdout.close()
sys.stdout = temp_stdout
# this should print again to the notebook
print(sys.stdout)

and we can see that running the above will make a stdout.log file containing



In [ ]:

    
! cat stdout.log

So this is pretty neat. This should give you an idea of how you have some control over being able to stream output on your system. You can start to do more complicated things such as redirecting output to multiple files based on what function is being called or something more clever. To learn more about some advanced features, such as using the python logging facility, you can check out the advanced section in this tutorial.

Let's go ahead and delete the files we just created in this tutorial



In [ ]:

    
! rm hello_world.log stdout.log

[top] Plain Text

We've just seen how you can redirect print output to a file in two different ways:

redirect individual print statements/functions on a case-by-case basis
globally redirect all print calls by changing where sys.stdout points to

To write output directly to a file using a file pointer is just as easy as reading it. Instead of using the file::read() command, you will use the file::write() command. Unlike print calls which add a newline character to the end for you, you need to add one manually in order to make newlines in a file. Time for the next exercise!



In [ ]:

    
name = raw_input("What is your name? ")
with open('plain_output.log', 'w') as fh:
    # rewrite these lines to use fh.write(....) instead
    print("Hello ", file=fh, end='') #no newline
    print(name, file=fh) #newline
    print("How are you today?", file=fh) #newline

I would like you to play around with the fh.write() by adding multiple lines in, and try to understand how this works. This above example combines a few pieces of what you've learned previously using raw_input as well as how to suppress a newline with a print call. You can check your work by running



In [ ]:

    
! cat plain_output.log

When you're satisfied that you understand plain text printing, go ahead and clean up the file you made



In [ ]:

    
! rm plain_output.log

[top] JSON

JSON is a very nice way to serialize many python objects into a plain text format that can be re-imported at a later time. It's as simple as loading the JSON library and calling the dumps() method to see the JSON representation, or calling the dump(data, fh) method to write to a file pointer.

Imagine a scenario where you have lots of plain text files, but you want to convert them to JSON. In this case, maybe we'll just load one of the plain data examples where we loaded the data into a list, and then just dump that list into a file in JSON format. Starting from a previous exercise, let's extend it more...

I've added the import json and json.dumps() calls for you. All you need to do is create a with statement that gives you an open file handle to write into using json.dump(). You can see a pretty JSON representation of the object we will write to the file. I use the nifty indent=4 to make it look pretty. The object we're dumping is a tuple (a, b) of two different lists: headers and data.



In [ ]:

    
import csv, json
with open("plain_data/example.csv") as fh:
    reader = csv.reader(fh, delimiter=',', quotechar='"')
    # use next() to get the first line/entry/row/event from the file, which are the headers
    headers = reader.next()
    # use the list(fh) trick to just get a list of everything else!
    data = list(reader)

# I want to write this to a file called: example.json
#    using the with statement to pass in a file pointer
print(json.dumps((headers, data), indent=4))

When all is said and done, you should verify that you wrote to the file correctly



In [ ]:

    
! cat example.json

Compare this to plain_data/example.csv to get an idea of how the two representations are a bit different, but both are still plain text!



In [ ]:

    
! cat plain_data/example.csv

and then clean up and remove the JSON file you just made.



In [ ]:

    
! rm example.json

[top] Iterating Iterables

In this section, I will introduce you to the next powerful concept of Python which are iterables. An iterable, as its name implies, is something you can iterate over in some sort of a loop: for, do/while, and so on.

From the python glossary, an iterable is:

a sequence
capable of returning its members one at a time
used in for loop or many other places where a sequence can be used

Some of the sequences we will cover include:

lists
sets
dictionaries
tuples
strings
files
numpy arrays

For more information, see this tutorial. See also iterator, sequence, and generator.



In [ ]:

    
# a list of stuff
a_list = [1,2,'three',2*2,"five","6", [1,2,3]]
print(type(a_list), a_list)



In [ ]:

    
# sets contain unique objects
a_set = {1,'two',3,3,1,1,1,1,1,1}
print(type(a_set), a_set)



In [ ]:

    
# dictionaries are key-value stores
a_dict = {"name": "Giordon Stark", "age": 27, "favorite_number": 3.1415926, ("dead","alive"): False}
print(type(a_dict), a_dict)



In [ ]:

    
# tuples are fixed and cannot be changed after you define it
a_tuple = ("Enrico","Fermi","Institute")
print(type(a_tuple), a_tuple)



In [ ]:

    
# you know what a string is
a_string = "Hello World!"
print(type(a_string), a_string)



In [ ]:

    
# and what a file is
a_file = open("plain_data/example.tsv", "r")
print(type(a_file), a_file)
a_file.close()

Iterating over these objects [top]

So let's show you how to iterate over each of these types of objects.

[top] List



In [ ]:

    
for item in a_list:
    print(type(item), item, repr(item))

[top] Set



In [ ]:

    
for item in a_set:
    print(type(item), item, repr(item))

[top] Dictionary



In [ ]:

    
for k, v in a_dict.iteritems():
    print(type(k), k, type(v), v)

but as you've noticed, dictionaries are a little special since they contain key/value pairs. How do you print the value of the dictionary? You can iterate over the keys of a dictionary and then access the values from a dictionary using either it's __getitem__() method with [] accessor, or using dict::get() which also allows you to set a default value... or just use dict.iteritems() to loop over (key, value) tuples that represent the dictionary instead.



In [ ]:

    
a_dict.items()



In [ ]:

    
for item in a_dict:
    print(type(item), item, a_dict[item])



In [ ]:

    
for item in a_dict:
    print(type(item), item, a_dict.get(item))
    
print(a_dict.get('FakeKey'))
print(a_dict.get('FakeKey', 'DefaultValue'))

[top] Tuple



In [ ]:

    
for item in a_tuple:
    print(type(item), item, repr(item))

[top] String



In [ ]:

    
for item in a_string:
    print(type(item), item, repr(item))

[top] Files



In [ ]:

    
""" Remember that below code is equivalent to:

a_file = open("plain_data/example.tsv")
for item in a_file:
    print(type(item), item, repr(item))
a_file.close()

"""

with open("plain_data/example.tsv") as f:
    for item in f:
        print(type(item), repr(item))

Notice that with iterating over a file pointer is like iterating over the readlines method we've seen previously in this tutorial.

[top] NumPy Array



In [ ]:

    
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9])
print(type(arr), repr(arr))
for item in arr:
    print(type(item), repr(item))

NumPy arrays are often just called ndarray which means "n-dimensional array". In fact, these arrays can have arbitrary dimension and are used heavily with data science in Python. In the next tutorial, I'll show some interesting features of NumPy arrays so that you can be more familiar with how to handle them.

[top] Common Sequence Operations

To close off this tutorial into Python, let's look at some common list operations and manipulations you should be familiar with.

Operation	Result
`range(j)`	Sequence of integers from `[0,j)` in steps of size 1
`range(i,j)`	Sequence of integers from `[i, j)` in steps of size 1
`range(i, j, k)`	Sequence of integers from `[i, j)` in steps of size `k`
`x in s`	`True` if an item of `s` is equal to `x`, else `False`
`x not in s`	`False` if an item of `s` is equal to `x`, else `True`
`s + t`	The concatenation of `s` and `t`, extending a sequence
`s * n` or `n * s`	equivalent to adding `s` to itself `n` times
`s[i]`	ith item of `s`, starting from 0
`s[i:j]`	slice of `s` from `i` to `j`
`s[i:j:k]`	slice of `s` from `i` to `j` with step `k`
`len(s)`	length of `s`
`min(s)`	smallest item of `s`
`max(s)`	largest item of `s`
`s.index(x[, i[, j]])`	index of the first occurence of `x` in `s` (at or after index `i` and before index `j`)
`s.count(x)`	total number of occurrences of `x` in `s`
`s.sort()`	sorting `s` in place
`sorted(s)`	return a sorted copy of `s`
`map(f, s)`	return a copy of `s` with `f` applied to every element

For more information, see python documentation. Let's see examples of the above in action.



In [ ]:

    
print(range(10))
print(range(2,9))
print(range(2,9,3))



In [ ]:

    
print(3 in range(10))
print(3 in range(2,9))
print(3 in range(2,9,3))



In [ ]:

    
range(3)+range(11,20,2)



In [ ]:

    
range(3)*3



In [ ]:

    
print(range(10)[2])
print(range(10)[2:6])
print(range(10)[2:6:3])



In [ ]:

    
print(len(range(10)))
print(min(range(10)))
print(max(range(10)))



In [ ]:

    
range(10).index(2)



In [ ]:

    
import math
map(math.sqrt, range(10))

[top] Comprehensions

Comprehensions are one of the most amazing things about Python. What better way then to show off some of the power?



In [ ]:

    
[i for i in range(10)]

How do you read this? "Make a list of i, for i in range(10) and return it to me". Why is this powerful? If you think about it, you could replace map() with list comprehension! Let's use that last example of map() and rewrite it using list comprehension.



In [ ]:

    
import math
map(math.sqrt, range(10)) #rewrite me!!!

You can even combine multiple loops, such as generating all possible combinations of two different lists. Let's take an example of combining a letter from my first name with a letter from my last name and generating all possible unique combinations:



In [ ]:

    
result = []
for first in "giordon":
    for last in "stark":
        result.append(first+last)
set(result)

How can we make this shorter, and perhaps more clearer, with comprehensions?



In [ ]:

    
set([first+last for last in "stark" for first in "giordon"])

[top] Slicing and Filtering

Slicing and filtering are the two cornerstones of data analysis. You need to constantly be able to clean/sanitize your data, as well as slice it based on conditions. For now, let's discuss how we slice/filter vanilla Python sequences. In the next tutorial focusing on NumPy arrays, I'll show you some more interesting examples.



In [ ]:

    
reverse_me = range(10)
reverse_me.reverse()
print(reverse_me)

Can you think of a way to reverse this list using slicing? As a reminder, here's an excerpt from the table above:

Operation	Result
`s[i:j:k]`	slice of `s` from `i` to `j` with step `k`



In [ ]:

    
# rewrite me into one line!
reverse_me = range(10)
reverse_me.reverse()
print(reverse_me)

Why do I care about this? The problem is that the reverse() function only exists on lists, but not all sequences. So you might want to do "giordon".reverse() only to find out that string objects do not have this method! That's unfortunate.

Once you've figured out the above exercise, try it with a string to see if your solution is able to work on strings too.



In [ ]:

    
# rewrite me using your solution!
''.join(reversed("giordon"))

Filtering becomes much easier in Python once you realize you can take advantage of comprehensions. Let's look at the example we had from above for combining the letters from my first and last names. Let's say I only wanted to include combinations that had at least one vowel - how would I do this?

set([first+last for last in "stark" for first in "giordon"])



In [ ]:

    
set([first+last for last in "stark" for first in "giordon" if first in "aeiou" or last in "aeiou"])

and again, in English, you would say "give me first plus last for last in 'stark', for first in 'giordon', only if first is in 'aeiou' or last is in 'aeiou'".

[top] Zipping

The very last thing (I promise) for this tutorial is to talk about the zip() function. This function allows you to iterate over multiple iterables simultaneously by returning the ith element from each iterable at the same time. Let's demonstrate by example:



In [ ]:

    
for first,middle,last in zip("giordon","holtsberg","stark"):
    print(first,middle,last)

You can think of this as a way to transpose a series of lists, but it will only iterate as far as the shortest list can go. In the above case, the shortest list was "stark" with a length of 5. You can try adding spaces to get something that might look aesthetically better:



In [ ]:

    
for first,middle,last in zip(" giordon ","holtsberg","  stark  "):
    print(first,middle,last)

But an incredibly neat trick, and you will see how to transpose NumPy arrays soon, is when you need to effectively transpose a list of lists.



In [ ]:

    
list_of_lists = [range(10)]*5
list_of_lists



In [ ]:

    
zip(*list_of_lists) #splat!

In Python, * has many meanings depending on the context it is found in. Here, we use it as the splat or unpack operator, which literally unpacks our lists for us. What this means is the following:

a_list = [a, b, c, d, e]
some_func(a_list) #equivalent to: some_func([a, b, c, d, e])

which calls some_func with one argument - a list. What if you wanted to pass the list items as arguments too? Unpack the list!

some_func(*a_list) #equivalent to: some_func(a, b, c, d, e)

which calls some_func with five arguments.

Table of Contents

[top] Introduction to Git and Python

[top] Overview of Git

[top] Overview of Python

[top] Getting Started with Git

[top] The Flying Circus (Getting started with Python)

[top] Python Futures

[top] Hello World

[top] Your Second Statement

[top] Functions and Debugging

[top] Input/Output

[top] Input

[top] Raw Input

[top] Plain Text

[top] JSON

[top] ROOT and HDF5

[top] Output

[top] Saying Hello World to a file

[top] Plain Text

[top] JSON

[top] Iterating Iterables

Iterating over these objects [top]

[top] List

[top] Set

[top] Dictionary

[top] Tuple

[top] String

[top] Files

[top] NumPy Array

[top] Common Sequence Operations

[top] Comprehensions

[top] Slicing and Filtering

[top] Zipping

Go back to Overview