There are multiple ways to setup and run this example:
docker run -ti riga/law:example loremipsum
source setup.sh
This example demonstrates how to create and run a simple law task tree.
The actual payload of the tasks is rather trivial. Six different versions of the lorem ipsum dummy text are fetched from a website. Per version, the character frequencies are measured, and merged and visualized in the end.
You might want to checkout the implemention of the tasks below in (tasks.py) while executing the notebook.
Before you proceed, load the law ipython magics:
%law
: runs the passed line in a subprocess%ilaw
: runs the passed line interactively in the current process (for tasks defined in notebooks)Since we do not define any tasks in this notebook, we are fine with %law
.
In [1]:
import law
law.contrib.load("ipython")
law.ipython.register_magics(init_cmd="source setup.sh", line_cmd="source setup.sh", log_level="INFO")
This is not specific to law but helps setting up the dependencies (luigi and six) in the example directory of this notebook.
This is equivalent to source setup.sh
when running the commands in a terminal.
Note that indexing is only required for auto-completion in the command line and therefore not that important for this notebook. However, it is a convenient feature to show your available tasks and complete their parameters when working with a terminal.
In [2]:
%law index --verbose
Besides, while indexing always sounds cumbersome, the law index file is just a human-readable file summarizing your tasks, the corresponding python modules, and their parameters. Have a look at the index file if you're interested. Note that the output of the cell below might be hidden.
In [3]:
%law index --show
Now, we want to use the law run
command for the first time. But to begin with, we add a parameter --print-status -1
to the command:
In [4]:
%law run ShowFrequencies --print-status -1
You should see that all output targets are absent and no task is complete yet.
Although law run
was called, no task was actually executed. A few parameters will make law only print helpful information in the command line, such as the status of a certain task (ShowFrequencies
above) and its recursive dependencies. The value given to --print-status
defines the recursion level, where 0
is the task given to law run
itself.
In [5]:
%law run ShowFrequencies --print-status 0
Other so-called interactive parameters are --print-deps
, --print-output
, --fetch-output
and --remove-output
. Use the help for find out more about these parameters. Note that the output of the cell below might be hidden.
In [6]:
%law run ShowFrequencies --help
Now we run the task and all its dependencies with a single command.
In [7]:
%law run ShowFrequencies
The task execution should be successful within a few seconds. You can scroll through the output and read the logs to get a sense of the way luigi is building up the dependency tree, followed by the scheduling of tasks, and eventually closing with an execution summary.
Also, you might want to add the --slow
parameter to make the tasks somewhat slower in order to see the progress logs appearing in the output. This is of course not a feature of law, but only implemented by the tasks in this example 😉.
As above, we add --print-status -1
again, to see the task status, represented by the existence of their output targets.
In [8]:
%law run ShowFrequencies --print-status -1
Note that the ShowFrequencies
task itself has no outputs. It is run once, but every time it is invoked, independent of the presence of a persistent file. The other tasks do have outputs, which we are going to delete in the next step.
As mentioned above, another interactive parameter to pass to law run
commands is --remove-output
. The passed value is interpreted as the recursion depth of dependent tasks whose output should be removed as well.
However, in order to avoid removing files by mistake, law interactively asks for confirmation before irreversibly removing anything. The prompt looks like this:
> law run ShowFrequencies --remove-output N
remove task output with max_depth N
removal mode? [i*(interactive), d(dry), a(all)]
The default mode (marked with *) is interactive (type 'i'), which means that law traverses the task tree interactively and asks for confirmation on every target. dry mode (type 'd') traverses the tree without actually removing anything. The all mode should be handled with care. Once you type 'a', the outputs of all tasks down to the requested recursion depth are removed.
To avoid interactive prompts in this example notebook, you can either do (though not recommended)
> echo a | law run ShowFrequencies --remove-output N
or add the mode with a comma to the value of the --remove-output
parameter. Here, we only want to remove the outputs down to the CountChars
task, i.e., at a depth of 2 (see the task tree above in the --print-status
outputs). This way, the FetchLoremIpsum
outputs are preserved.
In [9]:
%law run ShowFrequencies --remove-output 2,a
Verify your action by printing the status of, let's say, the first CountChars
task.
In [10]:
%law run CountChars --file-index 1 --print-status -1