In [6]:
import os
from dotenv import load_dotenv, find_dotenv
# find .env automagically by walking up directories until it's found
dotenv_path = find_dotenv()
# load up the entries as environment variables
load_dotenv(dotenv_path)
Out[6]:
In [7]:
# Get the project folders that we are interested in
PROJECT_DIR = os.path.dirname(dotenv_path)
EXTERNAL_DATA_DIR = PROJECT_DIR + os.environ.get("EXTERNAL_DATA_DIR")
RAW_DATA_DIR = PROJECT_DIR + os.environ.get("RAW_DATA_DIR")
# Get the list of filenames
files=os.environ.get("FILES").split()
print("Project directory is : {0}".format(PROJECT_DIR))
print("External directory is : {0}".format(EXTERNAL_DATA_DIR))
print("Raw data directory is : {0}".format(RAW_DATA_DIR))
print("Base names of files : {0}".format(" ".join(files)))
While some python packages that read files can handle compressed files, the zipfile package can deal with more complex zip files. The files we downloaded from have 2 files as their content. We just want the CSV files.
File objects are a bit more complex than other data structures. Opening, reading from, writing to them can all raise exceptions due to the permissions you may or may not have.
Access to the file is done via a file handler and not directly. You need to properly close them once you are done, otherwise your program keeps that file open as far as the operating system is concerned, potentially blocking other programs from accessing it.
To deal with that, you want to use the with zipfile.ZipFile() as zfile construction. Once the program leaves that scope, Python will nicely close any handlers to the object reference created. This also works great for database connections and other constructions that have these characteristics.
In [9]:
import zipfile
print ("Extracting files to: {}".format(RAW_DATA_DIR))
for file in files:
# format the full zip filename in the EXTERNAL DATA DIR
fn=EXTERNAL_DATA_DIR+'/'+file+'.zip'
# and format the csv member name in that zip file
member=file + '.csv'
print("{0} extract {1}.".format(fn, member))
# To make it easier to deal with files, use the with <> as <>: construction.
# It will deal with opening and closing handlers for you.
with zipfile.ZipFile(fn) as zfile:
zfile.extract(member, path=RAW_DATA_DIR)