Datasets: Downloading Data from the Mauna Loa Observatory

Open Data Science Initiative

28th May 2014 Neil D. Lawrence

This data set collection is from the Mauna Loa observatory which records atmospheric carbon levels. The data was used by Rasmussen and Williams (2006) to demonstrate hyperparameter setting in Gaussian processes. When first called, or if called with refresh_data=True the latest version of the data set is downloaded. Otherwise, the cached version of the data set is loaded from disk.


In [1]:
import pods
import pylab as plt
%matplotlib inline

In [2]:
data = pods.datasets.mauna_loa()


Acquiring resource: mauna_loa

Details of data: 
The 'average' column contains the monthly mean CO2 mole fraction determined from daily averages.  The mole fraction of CO2, expressed as parts per million (ppm) is the number of molecules of CO2 in every one million molecules of dried air (water vapor removed).  If there are missing days concentrated either early or late in the month, the monthly mean is corrected to the middle of the month using the average seasonal cycle.  Missing months are denoted by -99.99. The 'interpolated' column includes average values from the preceding column and interpolated values where data are missing.  Interpolated values are computed in two steps.  First, we compute for each month the average seasonal cycle in a 7-year window around each monthly value.  In this way the seasonal cycle is allowed to change slowly over time.  We then determine the 'trend' value for each month by removing the seasonal cycle; this result is shown in the 'trend' column.  Trend values are linearly interpolated for missing months. The interpolated monthly mean is then the sum of the average seasonal cycle value and the trend value for the missing month.

NOTE: In general, the data presented for the last year are subject to change, depending on recalibration of the reference gas mixtures used, and other quality control procedures. Occasionally, earlier years may also be changed for the same reasons.  Usually these changes are minor.

CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm 

 (-99.99 missing data;  -1 no data for daily means in month)

Please cite:
Mauna Loa Data. Dr. Pieter Tans, NOAA/ESRL (www.esrl.noaa.gov/gmd/ccgg/trends/) and Dr. Ralph Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu/).

After downloading the data will take up 46779 bytes of space.

Data will be stored in /Users/neil/ods_data_cache/mauna_loa.

You must also agree to the following license:
-------------------------------------------------------------------- USE OF NOAA ESRL DATA

  These data are made freely available to the public and the scientific community in the belief that their wide dissemination will lead to greater understanding and new scientific insights. The availability of these data does not constitute publication of the data.  NOAA relies on the ethics and integrity of the user to insure that ESRL receives fair credit for their work.  If the data  are obtained for potential use in a publication or presentation,  ESRL should be informed at the outset of the nature of this work.   If the ESRL data are essential to the work, or if an important  result or conclusion depends on the ESRL data, co-authorship may be appropriate.  This should be discussed at an early stage in the work.  Manuscripts using the ESRL data should be sent to ESRL for review before they are submitted for publication so we can insure that the quality and limitations of the data are accurately represented.

  Contact:   Pieter Tans (303 497 6678; pieter.tans@noaa.gov)

  RECIPROCITY  Use of these data implies an agreement to reciprocate. Laboratories making similar measurements agree to make their own data available to the general public and to the scientific community in an equally complete and easily accessible form. Modelers are encouraged to make available to the community, upon request, their own tools used in the interpretation of the ESRL data, namely well documented model code, transport fields, and additional information necessary for other scientists to repeat the work and to run modified versions. Model availability includes collaborative support for new users of the models.
 --------------------------------------------------------------------

     See www.esrl.noaa.gov/gmd/ccgg/trends/ for additional details.

Do you wish to proceed with the download? [yes/no]
yes
Downloading  ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt -> /Users/neil/ods_data_cache/mauna_loa/co2_mm_mlo.txt
[==============================]   0.046/0.046MB
Most recent data observation from month  8.0  in year  2015.0

Here, because I've downloaded the data before I have a cached version. To download a fresh version of the data I can set refresh_data=True.


In [3]:
data = pods.datasets.mauna_loa(refresh_data=True)


Acquiring resource: mauna_loa

Details of data: 
The 'average' column contains the monthly mean CO2 mole fraction determined from daily averages.  The mole fraction of CO2, expressed as parts per million (ppm) is the number of molecules of CO2 in every one million molecules of dried air (water vapor removed).  If there are missing days concentrated either early or late in the month, the monthly mean is corrected to the middle of the month using the average seasonal cycle.  Missing months are denoted by -99.99. The 'interpolated' column includes average values from the preceding column and interpolated values where data are missing.  Interpolated values are computed in two steps.  First, we compute for each month the average seasonal cycle in a 7-year window around each monthly value.  In this way the seasonal cycle is allowed to change slowly over time.  We then determine the 'trend' value for each month by removing the seasonal cycle; this result is shown in the 'trend' column.  Trend values are linearly interpolated for missing months. The interpolated monthly mean is then the sum of the average seasonal cycle value and the trend value for the missing month.

NOTE: In general, the data presented for the last year are subject to change, depending on recalibration of the reference gas mixtures used, and other quality control procedures. Occasionally, earlier years may also be changed for the same reasons.  Usually these changes are minor.

CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm 

 (-99.99 missing data;  -1 no data for daily means in month)

Please cite:
Mauna Loa Data. Dr. Pieter Tans, NOAA/ESRL (www.esrl.noaa.gov/gmd/ccgg/trends/) and Dr. Ralph Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu/).

After downloading the data will take up 46779 bytes of space.

Data will be stored in /Users/neil/ods_data_cache/mauna_loa.

You must also agree to the following license:
-------------------------------------------------------------------- USE OF NOAA ESRL DATA

  These data are made freely available to the public and the scientific community in the belief that their wide dissemination will lead to greater understanding and new scientific insights. The availability of these data does not constitute publication of the data.  NOAA relies on the ethics and integrity of the user to insure that ESRL receives fair credit for their work.  If the data  are obtained for potential use in a publication or presentation,  ESRL should be informed at the outset of the nature of this work.   If the ESRL data are essential to the work, or if an important  result or conclusion depends on the ESRL data, co-authorship may be appropriate.  This should be discussed at an early stage in the work.  Manuscripts using the ESRL data should be sent to ESRL for review before they are submitted for publication so we can insure that the quality and limitations of the data are accurately represented.

  Contact:   Pieter Tans (303 497 6678; pieter.tans@noaa.gov)

  RECIPROCITY  Use of these data implies an agreement to reciprocate. Laboratories making similar measurements agree to make their own data available to the general public and to the scientific community in an equally complete and easily accessible form. Modelers are encouraged to make available to the community, upon request, their own tools used in the interpretation of the ESRL data, namely well documented model code, transport fields, and additional information necessary for other scientists to repeat the work and to run modified versions. Model availability includes collaborative support for new users of the models.
 --------------------------------------------------------------------

     See www.esrl.noaa.gov/gmd/ccgg/trends/ for additional details.

Do you wish to proceed with the download? [yes/no]
yes
Downloading  ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt -> /Users/neil/ods_data_cache/mauna_loa/co2_mm_mlo.txt
[==============================]   0.046/0.046MB
Most recent data observation from month  8.0  in year  2015.0

The data dictionary contains the standard keys 'X' and 'Y' which give a unidimensional regression problem.


In [4]:
plt.plot(data['X'], data['Y'], 'rx')
plt.xlabel('year')
plt.ylabel('CO$_2$ concentration in ppm')


Out[4]:
<matplotlib.text.Text at 0x10b1c0e48>

Additionally there are keys Xtest and Ytest which provide test data. The number of points considered to be training data is controlled by the argument num_train argument, which defaults to 545. This number is chosen as it matches that used in the Gaussian Processes for Machine Learning book. Below we plot the test and training data.


In [5]:
plt.plot(data['X'], data['Y'], 'rx')
plt.plot(data['Xtest'], data['Ytest'], 'go')
plt.xlabel('year')
plt.ylabel('CO$_2$ concentration in ppm')


Out[5]:
<matplotlib.text.Text at 0x1090dab38>

Of course we have included the citation information for the data.


In [6]:
print(data['citation'])


Mauna Loa Data. Dr. Pieter Tans, NOAA/ESRL (www.esrl.noaa.gov/gmd/ccgg/trends/) and Dr. Ralph Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu/).

And extra information about the data is included, as standard, under the keys info and details.


In [7]:
print(data['info'])
print()
print(data['details'])


Mauna Loa data with 545 values used as training points.

The 'average' column contains the monthly mean CO2 mole fraction determined from daily averages.  The mole fraction of CO2, expressed as parts per million (ppm) is the number of molecules of CO2 in every one million molecules of dried air (water vapor removed).  If there are missing days concentrated either early or late in the month, the monthly mean is corrected to the middle of the month using the average seasonal cycle.  Missing months are denoted by -99.99. The 'interpolated' column includes average values from the preceding column and interpolated values where data are missing.  Interpolated values are computed in two steps.  First, we compute for each month the average seasonal cycle in a 7-year window around each monthly value.  In this way the seasonal cycle is allowed to change slowly over time.  We then determine the 'trend' value for each month by removing the seasonal cycle; this result is shown in the 'trend' column.  Trend values are linearly interpolated for missing months. The interpolated monthly mean is then the sum of the average seasonal cycle value and the trend value for the missing month.

NOTE: In general, the data presented for the last year are subject to change, depending on recalibration of the reference gas mixtures used, and other quality control procedures. Occasionally, earlier years may also be changed for the same reasons.  Usually these changes are minor.

CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm 

 (-99.99 missing data;  -1 no data for daily means in month)

And, importantly, for reference you can also check the license for the data:


In [8]:
print(data['license'])


-------------------------------------------------------------------- USE OF NOAA ESRL DATA

  These data are made freely available to the public and the scientific community in the belief that their wide dissemination will lead to greater understanding and new scientific insights. The availability of these data does not constitute publication of the data.  NOAA relies on the ethics and integrity of the user to insure that ESRL receives fair credit for their work.  If the data  are obtained for potential use in a publication or presentation,  ESRL should be informed at the outset of the nature of this work.   If the ESRL data are essential to the work, or if an important  result or conclusion depends on the ESRL data, co-authorship may be appropriate.  This should be discussed at an early stage in the work.  Manuscripts using the ESRL data should be sent to ESRL for review before they are submitted for publication so we can insure that the quality and limitations of the data are accurately represented.

  Contact:   Pieter Tans (303 497 6678; pieter.tans@noaa.gov)

  RECIPROCITY  Use of these data implies an agreement to reciprocate. Laboratories making similar measurements agree to make their own data available to the general public and to the scientific community in an equally complete and easily accessible form. Modelers are encouraged to make available to the community, upon request, their own tools used in the interpretation of the ESRL data, namely well documented model code, transport fields, and additional information necessary for other scientists to repeat the work and to run modified versions. Model availability includes collaborative support for new users of the models.
 --------------------------------------------------------------------

     See www.esrl.noaa.gov/gmd/ccgg/trends/ for additional details.