syncID: 3857005e98a544a88a5e58625e32b514 title: "Introduction to working with NEON eddy flux data" description: "Download and navigate NEON eddy flux data, including basic transformations and merges" dateCreated: 2019-07-09 authors: Claire K. Lunch contributors: estimatedTime: 1 hour packagesLibraries: rhdf5, neonUtilities, ggplot2 topics: HDF5, eddy-covariance, eddy-flux languagesTool: R dataProduct: DP4.00200.001 code1: /R/eddy-intro/eddy_intro.r tutorialSeries:
This data tutorial provides an introduction to working with NEON eddy
flux data, using the neonUtilities
R package. If you are new to NEON
data, we recommend starting with a more general tutorial, such as the
neonUtilities tutorial
or the Download and Explore tutorial.
Some of the functions and techniques described in those tutorials will
be used here, as well as functions and data formats that are unique to
the eddy flux system.
This tutorial assumes general familiarity with eddy flux data and associated concepts.
In [ ]:
install.packages('BiocManager')
BiocManager::install('rhdf5')
install.packages('neonUtilities')
In [1]:
options(stringsAsFactors=F)
library(neonUtilities)
Use the zipsByProduct()
function from the neonUtilities
package to
download flux data from two sites and two months. The transformations
and functions below will work on any time range and site(s), but two
sites and two months allows us to see all the available functionality
while minimizing download size.
Inputs to the zipsByProduct()
function:
dpID
: DP4.00200.001, the bundled eddy covariance productpackage
: basic (the expanded package is not covered in this tutorial)site
: NIWO = Niwot Ridge and HARV = Harvard Foreststartdate
: 2018-06 (both dates are inclusive)enddate
: 2018-07 (both dates are inclusive)savepath
: modify this to something logical on your machinecheck.size
: T if you want to see file size before downloading, otherwise FThe download may take a while, especially if you're on a slow network.
In [2]:
zipsByProduct(dpID="DP4.00200.001", package="basic",
site=c("NIWO", "HARV"),
startdate="2018-06", enddate="2018-07",
savepath="/data",
check.size=F)
There are five levels of data contained in the eddy flux bundle. For full details, refer to the NEON algorithm document.
Briefly, the data levels are:
The dp0p data are available in the expanded data package and are beyond the scope of this tutorial.
The dp02 and dp03 data are used in storage calculations, and the dp04 data include both the storage and turbulent components. Since many users will want to focus on the net flux data, we'll start there.
To extract the Level 4 data from the HDF5 files and merge them into a
single table, we'll use the stackEddy()
function from the neonUtilities
package.
stackEddy()
requires two inputs:
filepath
: Path to a file or folder, which can be any one of:zipsByProduct()
functionlevel
: dp01-4Input the filepath you downloaded to using zipsByProduct()
earlier,
including the filestoStack00200
folder created by the function, and
dp04
:
In [3]:
flux <- stackEddy(filepath="/data/filesToStack00200/",
level="dp04")
We now have an object called flux
. It's a named list containing four
tables: one table for each site's data, and variables
and objDesc
tables.
In [4]:
names(flux)
Let's look at the contents of one of the site data files:
In [5]:
head(flux$NIWO)
The variables
and objDesc
tables can help you interpret the column
headers in the data table. The objDesc
table contains definitions for
many of the terms used in the eddy flux data product, but it isn't
complete. To get the terms of interest, we'll break up the column headers
into individual terms and look for them in the objDesc
table:
In [6]:
term <- unlist(strsplit(names(flux$NIWO), split=".", fixed=T))
flux$objDesc[which(flux$objDesc$Object %in% term),]
For the terms that aren't captured here, fluxCo2
, fluxH2o
, and fluxTemp
are self-explanatory. The flux components are
turb
: Turbulent fluxstor
: Storagensae
: Net surface-atmosphere exchangeThe variables
table contains the units for each field:
In [7]:
flux$variables
Let's plot some data! First, we'll need to convert the time stamps to an R date-time format (right now they're just character fields).
NEON sensor data come with time stamps for both the start and end of the averaging period. Depending on the analysis you're doing, you may want to use one or the other; for general plotting, re-formatting, and transformations, I prefer to use the start time, because there are some small inconsistencies between data products in a few of the end time stamps.
Note that all NEON data use UTC time, noted as
tz="GMT"
in the code below. This is true across NEON's instrumented,
observational, and airborne measurements. When working with NEON data,
it's best to keep everything in UTC as much as possible, otherwise it's
very easy to end up with data in mismatched times, which can cause
insidious and hard-to-detect problems. Be sure to include the tz
argument in all the lines of code below - if there is no time zone
specified, R will default to the local time zone it detects on your
operating system.
In [8]:
timeB <- as.POSIXct(flux$NIWO$timeBgn,
format="%Y-%m-%dT%H:%M:%S",
tz="GMT")
flux$NIWO <- cbind(timeB, flux$NIWO)
In [9]:
plot(flux$NIWO$data.fluxCo2.nsae.flux~timeB,
pch=".", xlab="Date", ylab="CO2 flux",
xaxt="n")
axis.POSIXct(1, x=timeB, format="%Y-%m-%d")
Like a lot of flux data, these data have some stray spikes, but there is a clear diurnal pattern going into the growing season.
Let's trim down to just two days of data to see a few other details.
In [10]:
plot(flux$NIWO$data.fluxCo2.nsae.flux~timeB,
pch=20, xlab="Date", ylab="CO2 flux",
xlim=c(as.POSIXct("2018-07-07", tz="GMT"),
as.POSIXct("2018-07-09", tz="GMT")),
ylim=c(-20,20), xaxt="n")
axis.POSIXct(1, x=timeB, format="%Y-%m-%d %H:%M:%S")
Note the timing of C uptake; the UTC time zone is clear here, where uptake occurs at times that appear to be during the night.
Many of the data sets we would use to interpret and model flux data are measured as part of the NEON project, but are not present in the eddy flux data product bundle. In this section, we'll download PAR data and merge them with the flux data; the steps taken here can be applied to any of the NEON instrumented (IS) data products.
To get NEON PAR data, use the loadByProduct()
function from the
neonUtilities
package. loadByProduct()
takes the same inputs as
zipsByProduct()
, but it loads the downloaded data directly into the
current R environment.
Let's download PAR data matching the Niwot Ridge flux data. The inputs needed are:
dpID
: DP1.00024.001site
: NIWOstartdate
: 2018-06enddate
: 2018-07package
: basicavg
: 30The new input here is avg=30
, which downloads only the 30-minute data.
Since the flux data are at a 30-minute resolution, we can save on
download time by disregarding the 1-minute data files (which are of course
30 times larger). The avg
input can be left off if you want to download
all available averaging intervals.
In [11]:
pr <- loadByProduct("DP1.00024.001", site="NIWO", avg=30,
startdate="2018-06", enddate="2018-07",
package="basic", check.size=F)
pr
is another named list, and again, metadata and units can be found in
the variables
table. The PARPAR_30min
table contains a verticalPosition
field. This field indicates the position on the tower, with 10 being the
first tower level, and 20, 30, etc going up the tower.
We'll connect PAR data from the tower top to the flux data.
In [12]:
pr.top <- pr$PARPAR_30min[which(pr$PARPAR_30min$verticalPosition==
max(pr$PARPAR_30min$verticalPosition)),]
loadByProduct()
automatically converts time stamps when it reads the
data, so here we just need to indicate which time field to use to
merge the flux and PAR data.
In [13]:
timeB <- pr.top$startDateTime
pr.top <- cbind(timeB, pr.top)
And merge the two datasets:
In [14]:
fx.pr <- merge(pr.top, flux$NIWO, by="timeB")
In [15]:
plot(fx.pr$data.fluxCo2.nsae.flux~fx.pr$PARMean,
pch=".", ylim=c(-20,20),
xlab="PAR", ylab="CO2 flux")
If you're interested in data in the eddy covariance bundle besides the net flux data, the rest of this tutorial will guide you through how to get those data out of the bundle.
The Level 3 (dp03
) data are the spatially interpolated profiles of
the rates of change of CO2, H2O, and temperature.
Extract the Level 3 data from the HDF5 file using stackEddy()
with
the same syntax as for the Level 4 data.
In [16]:
prof <- stackEddy(filepath="/data/filesToStack00200/",
level="dp03")
In [17]:
head(prof$NIWO)
In [18]:
prof.l2 <- stackEddy(filepath="/data/filesToStack00200/",
level="dp02")
In [19]:
head(prof.l2$HARV)
Note that here, as in the PAR data, there is a verticalPosition
field.
It has the same meaning as in the PAR data, indicating the tower level of
the measurement.
Level 1 (dp01
) data are calibrated, and aggregated in time, but
otherwise untransformed. Use Level 1 data for raw gas
concentrations and atmospheric stable isotopes.
Using stackEddy()
to extract Level 1 data requires additional
inputs. The Level 1 files are too large to simply pull out all the
variables by default, and they include mutiple averaging intervals,
which can't be merged. So two additional inputs are needed:
avg
: The averaging interval to extractvar
: One or more variables to extractWhat variables are available, at what averaging intervals? Another
function in the neonUtilities
package, getVarsEddy()
, returns
a list of HDF5 file contents. It requires only one input, a filepath
to a single NEON HDF5 file:
In [20]:
vars <- getVarsEddy("/data/filesToStack00200/NEON.D01.HARV.DP4.00200.001.nsae.2018-07.basic.h5")
head(vars)
Inputs to var
can be any values from the name
field in the table
returned by getVarsEddy()
. Let's take a look at CO2 and
H2O, 13C in CO2 and 18O in
H2O, at 30-minute aggregation. Let's look at Harvard Forest
for these data, since deeper canopies generally have more interesting
profiles:
In [21]:
iso <- stackEddy(filepath="/data/filesToStack00200/",
level="dp01", var=c("rtioMoleDryCo2","rtioMoleDryH2o",
"dlta13CCo2","dlta18OH2o"), avg=30)
In [22]:
head(iso$HARV)
Let's plot vertical profiles of CO2 and 13C in CO2 on a single day.
Here, for convenience, instead of converting the time stamps
to a time format, it's easy to use the character format to extract the ones
we want using grep()
. And discard the verticalPosition
values that are
string values - those are the calibration gases.
In [23]:
iso.d <- iso$HARV[grep("2018-06-25", iso$HARV$timeBgn, fixed=T),]
iso.d <- iso.d[-which(is.na(as.numeric(iso.d$verticalPosition))),]
ggplot
is well suited to these types of data, let's use it to plot
the profiles.
In [24]:
library(ggplot2)
In [25]:
g <- ggplot(iso.d, aes(y=verticalPosition)) +
geom_path(aes(x=data.co2Stor.rtioMoleDryCo2.mean,
group=timeBgn, col=timeBgn)) +
theme(legend.position="none") +
xlab("CO2") + ylab("Tower level")
g
In [26]:
g <- ggplot(iso.d, aes(y=verticalPosition)) +
geom_path(aes(x=data.isoCo2.dlta13CCo2.mean,
group=timeBgn, col=timeBgn)) +
theme(legend.position="none") +
xlab("d13C") + ylab("Tower level")
g
The legends are omitted for space, see if you can work out the times of day the different colors represent.
In [ ]: