Like other laguages, Python has the ability to import external modules (or libraries) into the current program. These modules may be part of the standard library that is automatically included with the Python installation, they may be extra libraries which you install separately or they may be other Python programs you have written yourself. Whatever the source of the module, they are imported into a program via an import
command.
For example, if we wish to access the mathematical constants pi
and e
we can use the import keyword to get the module named math
and access its contents with the dot notation:
In [ ]:
import math
print(math.pi, math.e)
Also we can use the as
keyword to give the module a different name in our code, which can be useful for brevity and avoiding name conflicts:
In [ ]:
import math as m
print(m.pi, m.e)
Alternatively we can import the separate components using the from … import
keyword combination:
In [ ]:
from math import pi, e
print(pi, e)
We can import multiple components from a single module, either on one line like as seen above or on separate lines:
In [ ]:
from math import pi
from math import e
Using the method dir()
and passing the module name:
In [ ]:
import math
dir(math)
or directly using an instance, like with this String:
In [ ]:
dir("mystring")
or using the object type
In [ ]:
dir(str)
The most useful information is online on https://www.python.org/ website and should be used as a reference guide.
os.path
— Common pathname manipulationsexists(path)
: returns whether path existsisfile(path)
: returns whether path is a “regular” file (as opposed to a directory)isdir(path)
: returns whether path is a directoryislink(path)
: returns whether path is a symbolic linkjoin(*paths)
: joins the paths together into one long pathdirname(path)
: returns directory containing the pathbasename(path)
: returns the path minus the dirname(path) in frontsplit(path)
: returns (dirname(path), basename(path))os
— Miscellaneous operating system interfaceschdir(path)
: change the current working directory to be pathgetcwd()
: return the current working directorylistdir(path)
: returns a list of files/directories in the directory pathmkdir(path)
: create the directory pathrmdir(path)
: remove the directory pathremove(path)
: remove the file pathrename(src, dst)
: move the file/directory from src to dstBuilding the path to your file from a list of directory and filename makes your script able to run on any platforms.
In [ ]:
import os.path
os.path.join("data", "mydata.txt")
# data/mydata.txt - Unix
# data\mydata.txt - Windows
Check if a file exists before opening it:
In [ ]:
import os.path
data_file = os.path.join("data", "mydata.txt")
if os.path.exists(data_file):
print("file", data_file, "exists")
with open(data_file) as f:
print(f.read())
else:
print("file", data_file, "not found!")
Write a script that reads a tab delimited file which has 4 columns: gene, chromosome, start and end coordinates. Check if the file exists, then compute the length of each gene and store its name and corresponding length into a dictionary. Write the results into a new tab separated file. You can find a data file in data/genes.txt
directory of the course materials.
csv
moduleThe so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. The csv module implements methods to read and write tabular data in CSV format.
The csv module’s reader()
and writer()
methods read and write CSV files. You can also read and write data into dictionary form using the DictReader()
and DictWriter()
methods.
For more information about this built-in Python library about CSV File Reading and Writing documentation.
Let's now read our data/mydata.txt
space separated file using the csv
module.
In [ ]:
import csv
with open("data/mydata.txt") as f:
reader = csv.reader(f, delimiter = " ") # default delimiter is ","
for row in reader:
print(row)
Change the csv.reader()
by the csv.DictReader()
and it builds up a dictionary automatically based on the column headers.
In [ ]:
with open("data/mydata.txt") as f:
reader = csv.DictReader(f, delimiter = " ")
for row in reader:
print(row)
In [ ]:
# Write a tab delimited file using the csv module
import csv
mydata = [
['1', 'Human', '1.076'],
['2', 'Mouse', '1.202'],
['3', 'Frog', '2.2362'],
['4', 'Fly', '0.9853']
]
with open("data.txt", "w") as f:
writer = csv.writer(f, delimiter='\t' )
writer.writerow( [ "Index", "Organism", "Score" ] ) # write header
for record in mydata:
writer.writerow( record )
# Open the output file and print out its content
with open("data.txt") as f:
print(f.read())
In [ ]:
# Write a delimited file using the csv module from a list of dictionaries
import csv
mydata = [
{'Index': '1', 'Score': '1.076', 'Organism': 'Human'},
{'Index': '2', 'Score': '1.202', 'Organism': 'Mouse'},
{'Index': '3', 'Score': '2.2362', 'Organism': 'Frog'},
{'Index': '4', 'Score': '0.9853', 'Organism': 'Fly'}
]
with open("dict_data.txt", "w") as f:
writer = csv.DictWriter(f, mydata[0].keys(), delimiter='\t')
writer.writeheader() # write header
for record in mydata:
writer.writerow( record )
# Open the output file and print out its content
with open("dict_data.txt") as f:
print(f.read())
Now change the script you wrote for Exercise 3.1 to make use of the csv
module.
So far we have been writing Python code in files as executable scripts without knowing that they are also modules from which we are able to call the different functions defined in them.
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Create a file called my_first_module.py
in the current directory with the following contents:
In [ ]:
def say_hello(user):
print('hello', user, '!')
Now enter the Python interpreter from the directory you've created my_first_module.py
file and import the say_hello
function from this module with the following command:
python3
Python 3.5.2 (default, Jun 30 2016, 18:10:25)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_first_module import say_hello
>>> say_hello('Anne')
hello Anne !
>>>
There is one module already stored in the course directory called my_first_module.py
, if you wish to import it into this notebook, below is what you need to do. If you wish to edit this file and change the code or add another function, you will have to restart the notebook to have these changes taken into account using the restart the kernel button in the menu bar.
In [ ]:
from my_first_module import say_hello
say_hello('Anne')
A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement. They are also run if the file is executed as a script.
Do comment out these executable statements if you do not wish to have them executed when importing your module.
For more information about modules, https://docs.python.org/3/tutorial/modules.html.
Write a function that calculates the GC content of a DNA sequence.
Write a function that extracts a list of overlapping sub-sequences for a given window size from a given sequence. Do not forget to test it on a given DNA sequence.
Combine the two methods written above to calculate the GC content of each overlapping sliding window along a DNA sequence from start to end.
Import the two methods you wrote above at exercise 3.3, to solve this exercise.
The new function should take two arguments, the DNA sequence and the size of the sliding window, and re-use the previous methods written to calculate the GC content of a DNA sequence and to extract the list of all overlapping sub-sequences. It returns a list of GC% along the DNA sequence.
Go to our next notebook: python_functions_and_modules_4