Modules and Packages

One of the advantages of Python that makes it so versatile for a wide range of tasks is the broad ecosystem of tools and packages that offer more specialized functionality on top of the "bare" Python.

Loading Modules: the import Statement

For loading built-in and third-party modules, Python provides the import statement.

Good

import sys

from os import path

import statistics as stats

from custom_package import mode

from statistics import mean, median

Example:


In [1]:
import math
math.cos(math.pi)


Out[1]:
-1.0

Bad: silently overwrites previous imports

from math import *

from pylab import *

Example:


In [5]:
help(sum)


Help on built-in function sum in module builtins:

sum(...)
    sum(iterable[, start]) -> value
    
    Return the sum of an iterable of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0).  When the iterable is
    empty, return start.

We can use this to compute the sum of a sequence, starting with a certain value (here, we'll start with -1):


In [6]:
sum(range(5), -1)


Out[6]:
9

Now observe what happens if we make the exact same function call after importing * from numpy:


In [7]:
from numpy import *

In [8]:
sum(range(5), -1)


Out[8]:
10

The result is off by one! The reason for this is that the import * statement replaces the built-in sum function with the numpy.sum function, which has a different call signature: in the former, we're summing range(5) starting at -1; in the latter, we're summing range(5) along the last axis (indicated by -1). This is the type of situation that may arise if care is not taken when using "import *" – for this reason, it is best to avoid this unless you know exactly what you are doing.

Reusing your own code

If you want to write larger and better organized programs (compared to simple scripts), where some objects are defined, (variables, functions, classes) and that you want to reuse several times, you have to create your own modules.

Let us create a module demo contained in the file demo.py:

# A demo module


def show_me_a():
    """Prints a."""
    print('a')

def show_me_b():
    """Prints b."""
    print('b')

c = 2
d = 2

In this file, we defined two functions show_me_a and show_me_b. Suppose we want to call the show_me_a function from the interpreter. We could execute the file as a script, but since we just want to have access to the function show_me_a, we are rather going to import it as a module. The syntax is as follows.

ipython
In [1]: import demo


In [2]: demo.show_me_a()
a

Importing from Python's Standard Library

Python's standard library contains many useful built-in module. To name but a few:

  • collections: container datatypes
  • logging: logging the execution of a script
  • os and sys: interfacing with the operating system
  • math and cmath: mathematical operations on real and complex numbers
  • itertools: constructing and interacting with iterators and generators
  • random: generating pseudorandom numbers
  • pickle: object persistence: saving objects to and loading objects from disk
  • json and csv: reading/writing JSON-formatted and CSV-formatted files.

More: https://docs.python.org/3/library/.

Importing from Third-Party Modules

PyPI

  • Often, when you want to code something, there is already a package that does it, so no need to reinvent the wheel!
  • There are >90,000 packages out there!
  • Various scientific libraries can be imported just as the built-in modules, but first the modules must be installed on your system.
  • The standard registry for such modules is the Python Package Index (PyPI for short): http://pypi.python.org/.
  • For convenience, Python comes with a program called pip (a recursive acronym meaning "pip installs packages"), which will automatically fetch packages released and listed on PyPI.

For example, if you'd like to install a nice package to work with physical units, pint, all that is required is to type the following at the command line:

$ pip install pint

The source code for the package will be automatically downloaded from the PyPI repository, and the package installed in the standard Python path (assuming you have permission to do so on the computer you're using).

Anaconda & conda

Anaconda is the leading open data science platform powered by Python. The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science.

Additionally, you'll have access to over 720 packages that can easily be installed with conda, our renowned package, dependency and environment manager, that is included in Anaconda.

References