Tutorial on Using Python Files

This is a tutorial on using Python files and using them to interact with the Python interpreter. Because we can comment in the IPython notebook, we use it instead of an interpreter.

You'll notice several files in this directory. They are:

hello_world_1.py hello_world_2.py row_echo.py fib_module.py

We'll work with each of these in turn.

Basics

First, open up hello_world_1.py with a text editor.. Once you've had a look, let's load the code into the Python interpreter via the import command.

What do you think will happen when we call the import command?


In [ ]:
import hello_world_1
Scroll down for more discussion.

What happened? Why did we get output? When you call "import file", you run all of the code in the file, including function definitions and then assign everything in the file a "namespace" to help make your code cleaner.

Take a look at hello_world_1.py again. See how it has a piece of code at the bottom that calls the function hello_world_1()? That's why we got output. It was the output from hello_world_1().

In some cases, this is desirable. For example, you may want some variables to be dynamically set when you import a module. Or you may want some information to be print to the screen. For example, the creators of Python hid a poem in one of the standard modules. Try running the following code:


In [ ]:
import this

So how do we interact with the code? Try calling the function hello_world_1().


In [ ]:
hello_world_1()

Doesn't work, right? Just like functions from built in modules, we have to use the right namespace. Try again, but like this:


In [ ]:
hello_world_1.hello_world_1()

There we go! So now we know how to get access to code in a file. But how do we learn about that code? Well, that's where Python's help functionality comes in. For example, if you wrote some documentation for your code and placed it at the top of the file in a string, you can access it using the 'help' command. This is an example of a 'docstring.' Try it out:


In [ ]:
help(hello_world_1)

Cool! That's a lot of useful information! Notice how we not only got the docstring at the top of the file, but also a list of functions and the docstrings assigned to them as well. Heck, we even got a list of variables defined in the code. If we want information on an individual function definition, we can get that too:


In [ ]:
help(hello_world_1.hello_world_2)

We can also get a list of the contents of a file using the 'dir' command:


In [ ]:
dir(hello_world_1)

The output is a list, so you can even assign it to a variable and print out one of hte elements.


In [ ]:
d = dir(hello_world_1)
print d[2]

Notice that some of the elements of the list are functions and variables like you were expecting and some of them are things we didn't set. The things we didn't set are surrounded by underscores. These are "hidden" variables and functions. Python doesn't really prevent you touching them or messing with them. But the underscores are there to tell you that you shouldn't touch them unless you know what you're doing.

You can define your own "hidden" variables and it can be useful to do so. We'll get back to that in a bit. For now, note that the "hidden" file variable contains... the filename!


In [ ]:
print hello_world_1.__file__

Note that we can manipulate variables as well. Let's try calling hello_world_1.hello_world_2. Then let's modify a variable in our file and try calling it again.


In [ ]:
hello_world_1.hello_world_2()

In [ ]:
hello_world_1.hellostring2="World...\nHello!"

What do you think will happen now?


In [ ]:
hello_world_1.hello_world_2()

Interesting...

Modules, Main Functions, and the Command Line

It was pretty annoying that our code ran when we imported hello_world_1.py, wasn't it? We could have avoided this by simply only writing definitions in our file. But that has the disadvantage that our python program isn't a proper program anymore. Right now, that code will run if we call that program from the command line. Try opening a terminal and calling the following:

python hello_world_1.py

It printed "hello world" to the screen, right? That's pretty convenient, right? But it would be even more convenient if we could make a program run via command line but only define functions and variables when we import it into an interpreter. Fortunately, we can make this happen with a trick I like to call the "main function trick." Open up hello_world_2.py and look inside. Then come back here.

Notice that piece of code at the end?


In [ ]:
if __name__=="__main__":
    print 'doing stuff'
    # do stuff

That piece of code is a special if statement. It says to only run the indented code if the program is called from teh command line. Therefore, if we import hello_world_2.py none of the code in that indented block will be run.


In [ ]:
import hello_world_2

But... all the definitions are still there.


In [ ]:
hello_world_2.hello_world_1()

In [ ]:
hello_world_2.hello_world_2()

In [ ]:
print hello_world_2.hellostring2

Often, it's a good idea to put all of the code we'll indent in our final "if name == main" statement into a single function, which is colloquially called the "main function." We did so in the hello_world_2.py file.


In [ ]:
hello_world_2.main()

Command line arguments

Often, we'd like Python programs to take command line arguments so that we can change the parameters of the code as we run it. Fortunately, command line arguments are stored in a list called sys.argv. To access it, you need to import the sys module.


In [ ]:
import sys

Let's see what command line arguments we have.


In [ ]:
print sys.argv

This is the list of arguments passed into the Python interpreter when we started the IPython notebook. Note that each argument is treated as a string.

Note that this list is the list of arguments passed into the PYTHON INTERPRETER, not into your program. So if you call a python program like:

python my_program.py arg1 arg2 arg3

Then the sys.argv list will look like:


In [ ]:
['my_program.py','arg1','arg2','arg3']

Check out the file row_echo.py to see an extremely simple example of where this trick might be useful.

A Practical Example

We now present a practical example where one might write a simple program in a file with both private variables and functions.

Check out the file fib_module.py and see if you can understand it. The basic idea is to compute the Fibonacci sequence $$F_{n} = F_{n-1} + F_{n-2}$$ recursively.

ut we recognize that if we need to compute many elements the Fibonacci sequence, we'll repeat a lot of work by computing the same sums over and over again. To overcome this difficulty, we keep track of the numbers we've already computed and return them if they're available.

I've implemented two functions that return elements of the Fibonacci sequence, dumb_fib and smart_fib. dumb_fib doesn't save any previous results. smart_fib saves them in a global variable in the file. Let's see which is faster.

For a more detailed explanation see: http://functionspace.org/articles/32/Fibonacci-series-and-Dynamic-programming

Once you've looked at the file, let's import it. We'll also import the timing module so that we can use it to make measurements.


In [ ]:
import fib_module
import time

Note that there's a hidden variable "fib_numbers" which we defined ourselves. If there are only underscores at the front of a variable, it's usually hidden but user defined. If it has underscores in front and behind, it's usually system defined.


In [ ]:
dir(fib_module)

In [ ]:
print fib_module.__fib_numbers

Now, let's call our naive implementation of the Fibonacci sequence, which we've named "dumb_fib." Let's see how long it takes to run.


In [ ]:
N=35
tstart=time.time()
print "Fib {} = {}".format(N,fib_module.dumb_fib(N))
tend = time.time()
tdiff1=tend-tstart
print "Time: {} nanoseconds".format(tdiff1*1000000)

Whew! That took a while! Let's try and see what happens if we use the implementation that saves information.


In [ ]:
tstart=time.time()
print "Fib {} = {}".format(N,fib_module.smart_fib(N))
tend = time.time()
tdiff2=tend-tstart
print "Time: {} nanoseconds".format(tdiff2*1000000)

Wow! How much faster is that?


In [ ]:
print float(tdiff1)/tdiff2

And let's see what our variable containing the Fibonacci numbers is now:


In [ ]:
print fib_module.__fib_numbers

It's way longer! And that looks like the Fibonacci sequence alright! And if we call the same method again, it can use those stored variables and get a small speedup.


In [ ]:
tstart=time.time()
print "Fib {} = {}".format(N,fib_module.smart_fib(N))
tend = time.time()
print "Time: {} nanoseconds".format((tend-tstart)*1000000)

It's a little faster! It seems this strategy is quite useful!

Note that this strategy isn't just useful for repeat calls to the Fibonacci function. It helps on the first call too. Without caching, finding the $N^{th}$ fibonacci number costs $$\left(\frac{1+\sqrt{5}}{2}\right)^N$$ which is very costly in computer time. (Also it's the golden ratio to the $n^{th}$ power, which is kind of cute, don't you think?) With caching, it only takes linear time, which grows as $N$. Cool, huh?


In [ ]: