Tips on Writing Landlab Components

For more Landlab tutorials, click here: https://landlab.readthedocs.io/en/latest/user_guide/tutorials.html

Thanks for your interest in developing a component in Landlab!

This ipython notebook provides some tips on designing and building Landlab components. It assumes you are familiar with the basics of building a component. If you haven't already, take a look at the tutorial on How to Write a Landlab Component, the User Guide section on Developing Your Own Component, and the Example Pull Request Creating a Component. We also recommend that you familiarize yourself with the Style and Lint conventions.

Why Does This Guide Exist?

The Landlab development team has been working together for nearly a decade to create the Landlab toolkit. Through this time many of the team members have iterated toward a set of best practices for creating and using components. Some of these recommendations are enforced in order to contribute code to Landlab, and some of them are just recommended.

We want to make sure we communicate these recommendations to the community of potential Landlab component developers.

Components versus Models

Landlab components are designed to serve as the building blocks for numerical models, but a Landlab component is not a model by itself. Rather, a component-based Landlab model typically consists of a driver program that instantiates and runs one or more components. For this reason, a Landlab component will not normally create or contain another component (though there are occasional exceptions that prove the rule; for example, a FlowAccumulator can create a FlowDirector). Instead, if one component relies on output from another component, the recommended practice is to have the relevant output field from one component act as an input field for another. For example, a component that generates a spatial rainfall pattern would have the rainfall as an output field; a runoff-generation component might then require a rainfall field as an input.

A common question we get that relates to the component vs model concept is "how do I setup my component to ingest entire rainfall record for my model run?" The recommended practice is to build your component so it can look at fields on the model grid and run forward in time a duration dt. If say, you had a external dataset that gave you gridded rainfall through time as a (nr, nc, nt) numpy array, we would recommend that your build your component to look at a precipitation field each time it runs one step, and that your driver use the gridded precipititation each timestep to update that field.

We advocate for this design because it is agnostic to exactly how a components input fields (in this example precipitation) are set. This makes components more reusable.

Required and Recommended Elements for Components

A Landlab component must:

Derive from the Component base class
Provide the standard header information (such as input_var_names)
Provide an __init__ method. A Landlab grid object should be the first argument (after self). Any other necessary parameters should be given as keyword arguments with meaningful defaults.
Provide a run_one_step method.
Include external and internal documentation. If the component is included in the Landlab package (as opposed to being a separate add-on), it needs to have an entry in the API Reference Manual at landlab.readthedocs.org. (In most cases, the text on the website is autogenerated from the docstrings in the component itself, but the documentation files must still be updated by hand to refer to the new component, as described below.) The component code should include a header docstring that briefly describes what it is, and lists its input parameters.

In addition to these basic ingredients, we highly recommend that Landlab components also include the following:

A docstring for each function. Docstrings should ideally include (1) a one-line description of what the component does, starting with a verb (e.g., "Update fields with current loading conditions"); (2) a list of parameters with their data types, units (if applicable), and a brief description; (3) data returned (if any); (4) one or more doctests; and (5) any relevant notes that will help your future self (or others) understand how the function works.
Unit tests. These can be implemented either as built-in doctests, or as external tests, or both.
Tutorial(s). A Jupyter Notebook is a great way to illustrate how a component can be used. Notebooks that contain tutorials about various Landlab components are in the notebooks folder. Ideally, every component should have an accompanying tutorial (though as of this writing we have not yet reached that goal).

As part of an ongoing effort to standardize the Landlab component library, we will soon require the following additional characteristics. Not all Landlab components currently meet these requirements, but starting with an upcoming v2.0 release all components will be required to meet these standards.

The component may use *kwds at the end of the __init__ but must raise an error if unused kwds are passed.
The run_one_step method must either have no arguments other than self, or have a single additional argument that represents the duration of the step.
Class variables -- i.e., those defined as self.[var_name] in the code -- that are not meant to be seen or modified by users should ideally be flagged as private by adding a leading underscore to the variable name (this will be most of them!). Those that don't are considered public, and should be handled using the @property decorator as described below.

Unit Testing

Testing is essential to writing robust scientific software (see, e.g., The Turing Way guide to testing research code). A typical Landlab component includes two types of test: doctests and external tests.

Doctests, in addition to testing a particular piece of code, should also give users an idea of how the code works---in other words, it should function both as a test, and as an example of how to use the functionality in question. Docstrings, including doctests, are scraped automatically to create content in the Landlab API Reference Manual.

External testing scripts normally live in a subdirectory called tests inside your component's main folder, in a file called test_(something).py, with one or more functions whose names begin with test_. Using this naming convention is how the testing tool we use (pytest) is able to find and run your tests.

Useful tools for writing tests

The numpy.testing module provides handy functions for testing, such as asserting that the values in a particular array match the values that you expect. pytest provides the ability to test whether a particular error is raised with the pytest.raises function. Nonetheless, in many cases the core Python libraries will suffice for testing: a test fails if any assertion fails and/or an error or exception is raised, and you can both assert logical conditions in tests and raise various standard Python or Landlab-specific exceptions within components "by hand".

If you use a common block of code (e.g., setting up a grid) in multiple test, we recommend looking into using a pytest.fixture to define it.

If you want to write the same test but loop through multiple parameter values (e.g., use both D8 and MFD flow directing), check out the pytest.parametrize function.

How we use the tests as part of maintaining Landlab

If your component is part of the Landlab codebase (and not an external plugin), its tests will be run automatically whenever a pull request is made and we will enforce that they all pass before bringing the code into the main repository.

But as part of your development, you might also want to run the tests locally. As of this writing, the Landlab Development Team uses the pytest utility, together with the coverage extension that calculates which lines of code are and are not covered by at least one test. An example of the command-line syntax to run tests is:

pytest --doctest-modules --cov-report term-missing --cov=.

This example usage tells pytest to run doctests as well as external tests, to examine the coverage of any files in the current directory or its subdirectories, and to display a report listing the coverage in each file.

What parts of my should I write tests for?

Best practice is to have your component 100% tested, meaning that every line gets run at least once. Not all of Landlab meets this standard (as of writing, total coverage is 87 %) and the following is what we find to be most practical. We recommend writing tests with small grids for which you can do all calculations by hand (typically 3x10, 5x5, or smaller).

Any part of your code that has an analytical solution should get a test demonstrating that this solution is met.
Barring an analytical solution, you should make a small grid and create a test that asserts that a known correct answer is met (e.g., run just one step forward, and calculate exactly what the end-of-run-one-step values are).
If there are places where you ensure something about an input (e.g., that porosity is positive), you should write a tests indicating that if the value is bad and an Error is raised.
Your tests don't need to be of the entire run-one-step process, but might just test critical parts of it.

Documentation

Each Landlab component ideally has three kinds of documentation: internal documentation using docstrings within the code, "external" documentation in the Landlab API Reference Manual, and one or more tutorials in the notebooks folder. Internal and external documentation are essentially one and the same: the internal docstrings are read and formatted ("scraped") to produce the API Reference documentation for each component. To get a component's docstrings included in the API Reference, you simply need to create a short text file in Landlab's docs folder, and edit the index.rst file in the same folder to add a reference to your new file. The process is described in the User Guide section on Developing Your Own Component.

Coding Tips and Tricks

Avoid Hard-Coding Numbers

It can be tempting to include hard numbers in your code, such as:

grav_force = 9.81 * mass

There are two disadvantages to this approach. An obvious one is that a user who wants to change the hard number is forced to edit the code (what if you want to run your model for Mars?). Another disadvantage is that the hard numbers are buried deep in the code, and might not be documented. Yet a third potential problem is that the hard-wired numbers may have units, and therefore rely on the assumption that any other inputs and variables have compatible units. And finally, your hardwired numbers may lack sufficient precision for a user's application (e.g., is gravitational acceleration 10.0 or 9.8 or 9.81?).

A better option is to have these numbers be user-determined inputs with built-in default values. For example,

class MyCoolComponent(Component):
    ...
    def __init__(self, grid, grav_accel=9.8):
        ...

If for some reason this is not a practical option---for example, if the numerical value in question is truly a constant---then a good practice is to use "syntactic sugar": define a variable in ALLCAPS near the head of the file in question. For example,

GRAV_CONSTANT = 6.67408e-11  # near head of file

...

grav_force = GRAV_CONSTANT * (mass1 * mass2 / distance**2)  # somewhere deep in the code

Avoid Multiple `if` Sequences

Sometimes it is the nature of an algorithm to ask a lot of questions:

if this:
    ...
elif that:
    ...
else:
    ...

Although this kind of construction can be hard to avoid in some cases, it also increases the testing burden: each case needs to be tested in order to ensure 100% coverage. They can also hamper performance, if the tests need to be repeated many times. If you find yourself writing long chains of conditional statements, consider whether there are any cleaner alternatives.

For example consider the following example. Say that params is a dictionary and you need to check if spam is in params, you need a boolean variable called spam_in_params that indicats whether it is present. You could write:

if "spam" in params:
    spam_in_params = True
else:
    spam_in_params = False

You could acomplish the same thing with a cleaner alternative:

spam_in_params = params.get("spam", False)

Very complex sets of if/else logic may be best dealt with using a dictionary as a lookup table.

Field Names

Landlab field names should be reasonably descriptive, while not being overly long. For example, hydraulic_conductivity is a better field name than simply K. As of this writing, Landlab has not yet adopted a standard naming convention (such as the CSDMS Standard Names, or the CF Standard Names), but best practice is to follow the de facto Landlab standards for names that already exist in at least one component. Landlab has tended thus far to use names that are in the spirit of (and sometimes identical to) the CSDMS Standard Names, while keeping these names to a manageable length.

An (infrequently updated) list of the standard names currently used in Landlab can be found here.

Public and Private Variables

The following describes an emerging coding standard for component class variables to be adopted starting with Landlab 2.0:

Use the Python convention that class variables are "private" if their name begins with an underscore, and "public" if it does not. For internal variables that are not intended to be viewed or modified by users, make the variable private by prepending an underscore. Example:

self._diffusivity = 0.01

For internal variables that you wish users to be able to read (only), make the internal variable private but provide a "getter" function that is tagged with the @property decorator. Example (from the Flexure component):

@property
def youngs(self):
    """Young's modulus of lithosphere (Pa)."""
    return self._youngs

Including the "getter" function allows you to include documentation for the public variable in the form of a docstring.

If you want users to have both read and write access to a variable---for example, if you want a user to be able to dynamically change one of your parameters if needed---provide both a "getter" and a "setter". An example of a getter-setter pair is (also from Flexure):

@property
def eet(self):
    """Effective elastic thickness (m)."""
    return self._eet

@eet.setter
def eet(self, new_val):
    if new_val <= 0:
        raise ValueError("Effective elastic thickness must be positive.")
    self._eet = new_val
    self._r = self._create_kei_func_grid(
       self._grid.shape, (self.grid.dy, self.grid.dx), self.alpha
    )

Using a setter function allows you to make sure that the user isn't giving an inappropriate value of the parameter (as in the example of a negative elastic thickness above, which would not make any sense). In general, think long and hard about giving users the ability to set variables. An update of a variable at the "wrong" time can easily lead to unforeseen consequences if some parts of the component have assumed the old value previously. Careful testing is probably in order in these cases.

Other references

Many of the landlab developers have found this resource on python anti-patterns helpful.