This is a 'Run' version. For more details about what is computed and how it is implemented, see de development notebook.
USAGE: A good practice to ensure reproducibility is:
WARNING 1: Don't run all cells at once. Some cells may be executed several times and other may be executed optionally. It is also better to run the notebook cell by cell for carefull inputs/outputs verification.
WARNING 2: Don't shutdown the kernel associated with this notebook (or 'close and halt' or restart the kernel) until the process has finished. The process will run in the background and thus will normally not be affected by the kernel shutdown, but it will not be possible to get information from the process anymore.
NOTE: It is here possible to take advantage of multiple CPUs and automatically split the process into several, parallel jobs (it uses the IPython parallel system). To activate this, set a value > 1 for the argument ncpu (nengines) of the command line.
Write here a more detailled description about the purpose of this run, the inputs used, etc...
Write here your comments about the process outcomes (how look the outputs, etc...).
In [7]:
var notebook_name = document.getElementById('notebook_name').innerHTML;
var notebook_root_dir = document.body.getAttribute('data-project');
var notebook_rel_path = document.body.getAttribute('data-notebook-path');
var kernel = IPython.notebook.kernel;
kernel.execute("notebook_name = '" + notebook_name + "'");
kernel.execute("notebook_root_dir = '" + notebook_root_dir + "'");
kernel.execute("notebook_rel_path = '" + notebook_rel_path + "'");
In [9]:
import os
notebook_rel_path = os.path.dirname(notebook_rel_path)
notebook_rel_path = notebook_rel_path.replace('IPYNotebooks/', '')
this_notebook_dir = os.path.join(os.path.expanduser('~'), 'IPYNotebooks',
notebook_runs_dir = os.path.join(os.path.expanduser('~'), 'IPYRuns')
this_run_dir = os.path.join(notebook_runs_dir, notebook_rel_path, notebook_name)
if not os.path.exists(this_run_dir):
print("run directory is: " + this_run_dir)
In [ ]:
%%writefile input.yaml
# This is an input file for the script ''
# Format of the file is YAML
# ------------------
# ------------------
# Input main directory
# Should contain GEOS-Chem ouput datafields
in_dir: ../../../IPYNotebooks/nb_geoschem/data/ts_example
# GEOS-Chem output file(s) (netCDF and/or bpch)
# May be either
# (1) the name of a single file present in `in_dir`
# (2) an absolute path to a single file
# (3) a file-matching pattern using de wildcard character.
# (4) a list of any combination of (1), (2) and (3)
# Mixing CTM outputs and ND49 outputs (time series) may work
# (though not tested yet), but datafields must not overlap in time.
# All datafields contained in the files must use the same horizontal
# grid (or a subset of this grid)!
in_files: 'ts.joch.200401*'
# Path to save output files where extracted data will be written
# If '~' is given, output files will be saved in the directory
# from where the script is run
out_dir: ~
# Basename of the output files for profiles
# Should not include the file extension
# Any wildcard "*" will be replaced by the `station_name` parameter
out_profiles_basename: '*_profiles_200401'
# Basename of output file for columns
out_columns_basename: '*_columns_200401'
# Format of output files
# One of the following: "csv", "hdf5", "xls", "xlsx"
# In addtion, netCDF files will be created (iris cubes).
out_format: xlsx
# ------------------
# ------------------
# List of tracers/diagnostics for which profiles and columns
# will be extracted/computed
tracers: [PAN, CO, ACET, C3H8, CH2O, C2H6, NH3]
# List of diagnostic categories to load
# Should be "IJ-AVG-$" for tracers
categories: [IJ-AVG-$]
# Additional fields names to load (format: 'diagnostic_category')
# Must at least include datafields required for columns calculation,
# ------------------
# ------------------
# Name of the station
station_name: JungfrauJoch
# Latitude of the station [degrees_north]
station_lat: 46.54806
# Longitude of the station [degress_east]
station_lon: 7.98389
# Elevation a.s.l at the station [meters],
station_altitude: 3580.
# Path to the file (CF-netCDF) that contains the altitude values
# of the vertical grid on which data will be regridded.
station_vertical_grid_file: /home/bovy/Grids/
# ------------------
# ------------------
# Grid model name
# All GEOS-Chem ouputs that will be loaded must use this grid.
# See :prop:`pygchem.grid.CTMGrid.models`
# for a list of available lodels
grid_model_name: GEOS57_47L
# Grid horizontal resolution (lon, lat) [degrees]
# All GEOS-Chem ouputs must use this resolution
grid_model_resolution: [2.5, 2]
# Grid indices (min, max) of the 3D region box of interest
# i: longitude, j: latitude, l: vertical levels
# Must match the extent that was defined for any ND49
# diagnostic output specified in `in_files`.
# Must emcompass the position of the station (see below).
# Used either to define the coordinates of ND49 outputs or to
# extract a subset from the global CTM datafields.
iminmax: [76, 77]
jminmax: [69, 70]
lminmax: [1, 47]
# ------------------
# ------------------
# Path to the file of global topography needed for resampling
# the tracer profiles on a vertical grid with fixed altitude values.
# The global topography grid must be compatible with the
# GEOS-Chem grid used by the output GEOS-Chem files.
global_topography_datafile: /home/bovy/Grids/
In [ ]:
import sys
cmd = "{executable} {script} input.yaml --loglevel={loglevel} --nengines={ncpu}"
cmd = cmd.format(
# path to executable (same python interpreter than the one used to run the notebook server)
# path to the script
script=os.path.join(this_notebook_dir, 'run_scripts', ''),
# number of CPU to use
# loglevel ('CRITICAL', 'ERROR', 'WARNING', 'INFO' or 'DEBUG')
print("Command to execute: " + cmd)
In [ ]:
import subprocess
import os
import sys
import shlex
# prevent running a new process if a process is already running.
if process.poll() is None:
raise RuntimeError('A process is already running')
except NameError:
# split the command into a sequence
cmd = shlex.split(cmd)
# comment the line above and use the command string instead of sequence if shell is True
with open('process.log', 'w') as log:
process = subprocess.Popen(cmd, shell=False, stdout=log, stderr=log)
print("New process started. PID: {}".format(
In [ ]:
import sys
if process.poll() is None:
print("process is running")
status = 'running'
elif process.poll() == 0:
print("process has terminated succesfully")
status = 'success'
sys.stderr.write("process has terminated with errors\n")
status = 'error'
except NameError:
print("no process is running! "
"(or connection with the process loosed due "
"to a kernel issue or kernel shutdown/restart)")
status = None
In [ ]:
if status is not None:
%cat process.log
If using the IPython parallel cluster, display ouputs (stdout, stderr) of all engines as they are printed out (debug)
while the process is running, run the script
in the run_scripts
$ python
In [ ]:
import signal
if status == 'running':
In [ ]:
import os
import getpass
import sys
user = getpass.getuser()
ipy_profile = 'nb_{}'.format(user)
ipcluster_exe = os.path.join(
os.system('{} stop --profile={}'.format(ipcluster_exe, ipy_profile))
In [ ]:
!ls -all -h
In [ ]:
import glob
from IPython.display import display, FileLink
for fout in glob.glob('*'):
if os.path.isdir(fout):
fout_link = FileLink(fout)
In [ ]: