Path Management

Goal

  • Normalize paths on different platform
  • Create, copy and remove folders
  • Handle errors

Modules


In [ ]:
import os
import os.path
import shutil
import errno
import glob
import sys

See also:

  • pathlib on Python 3.4+

In [ ]:
# Be python3 ready
from __future__ import unicode_literals, print_function

Multiplatform Path Management

  • The os.path module seems verbose but it's the best way to manage paths. It's:
    • safe
    • multiplatform
  • Here we check the operating system and prepend the right path

In [ ]:
import os
import sys
basedir, hosts  = "/", "etc/hosts"

In [ ]:
# sys.platform shows the current operating system
if sys.platform.startswith('win'):
    basedir = 'c:/windows/system32/drivers'
print(basedir)

In [ ]:
# Join removes redundant "/"
hosts = os.path.join(basedir, hosts)
print(hosts)

In [ ]:
# normpath fixes "/" orientation 
# and redundant ".."
hosts = os.path.normpath(hosts)
print("Normalized path is", hosts)

In [ ]:
# realpath resolves symlinks (on unix)
! mkdir -p /tmp/course
! ln -sf /etc/hosts /tmp/course/hosts
realfile = os.path.realpath("/tmp/course/hosts") 
print(realfile)

In [ ]:
# Exercise: given the following path
base, path = "/usr", "bin/foo"
# Which is the expected output of result?
result = os.path.join(base, path)
print(result)

Manage trees

Python modules can:

- manage directory trees
- and basic errors

In [ ]:
# os and shutil supports basic file operations
# like recursive copy and tree creation.
!rm -rf /tmp/course/foo
from os import makedirs
makedirs("/tmp/course/foo/bar")

In [ ]:
# while os.path can be used to test file existence
from os.path import isdir
assert isdir("/tmp/course/foo/bar")

In [ ]:
# Check the directory content with either one of
!tree /tmp/course || find /tmp/course

In [ ]:
# We can use exception handlers to check
#  what happened.

try:
    # python2 does not allow to ignore
    #  already existing directories
    #  and raises an OSError
    makedirs("/tmp/course/foo/bar")
except OSError as e:
    # Just use the errno module to
    #  check the error value
    print(e)
    import errno
    assert e.errno == errno.EEXIST

In [ ]:
from shutil import copytree, rmtree
# Now copy recursively two directories
# and check the result
copytree("/tmp/course/foo", "/tmp/course/foo2")
assert isdir("/tmp/course/foo2/bar")

In [ ]:
#This command should work on both unix and windows 
!ls /tmp/course/foo2/

In [ ]:
# Now remove it and check the outcome
rmtree("/tmp/course/foo")
assert not isdir("/tmp/course/foo/bar")

In [ ]:
#This command should work on both unix and windows 
!ls /tmp/course/

In [ ]:
# Cleanup created files
rmtree("/tmp/course")

Encoding

Goals

  • A string is more than a sequence of bytes
  • A string is a couple (bytes, encoding)
  • Use unicode_literals in python2
  • Manage differently encoded filenames
  • A string is not a sequence of bytes

In [ ]:
import os
import os.path
import glob

In [ ]:
from os.path import isdir
basedir = "/tmp/course"
if not isdir(basedir):
    os.makedirs(basedir)

In [ ]:
# Py3 doesn't need the 'u' prefix before the string.
the_string = u"S\u00fcd" # Sued
print(the_string)
print type(the_string)

In [ ]:
# the_string Sued can be encoded in different...
in_utf8 = the_string.encode('utf-8')
in_win = the_string.encode('cp1252')

# ...byte-sequences
assert type(in_utf8) == bytes
print type(in_utf8)

In [ ]:
# Now you can see the differences between 
print(repr(in_utf8))
# and
print(repr(in_win))

In [ ]:
# Decoding bytes using the wrong map...
# ...gives Sad results
print(in_utf8.decode('cp1252'))
print (in_utf8.decode('utf-8'))

To make Py2 encoding-aware we must

from future import unicode_literals, print_function