The goal is to see how we can read the data contained in a netCDF file. Several possibilities will be examined.

Reading a local file

Let's assume we have downlowded a file from CMEMS. We define the directory and the file name. datafile have to be adapted according to your case.


In [5]:
datafile = "~/CMEMS_INSTAC/INSITU_MED_NRT_OBSERVATIONS_013_035/history/mooring/IR_TS_MO_61198.nc"

In [6]:
import os
datafile = os.path.expanduser(datafile)

To read the file we need the netCDF4 interface for python.


In [7]:
import netCDF4
ds = netCDF4.Dataset(datafile, 'r')

where the first argurment of the files and 'r' indicates that it's open for reading ('w' would be used for writing).
ds contains all the information about the dataset:

  • Metadata (global attributes)
  • Dimensions
  • Variables

Metadata


In [8]:
ds


Out[8]:
<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format UNDEFINED):
    data_type: OceanSITES time-series data
    format_version: 1.2
    platform_code: 61198
    date_update: 2015-08-02T11:20:44Z
    institution: Puertos del Estado (Spain)
    institution_edmo_code: 2751
    site_code:  
    wmo_platform_code: 61198
    source: Mooring observation
    history: 2015-08-02T11:20:44Z: Creation
    data_mode: R
    quality_control_indicator: 6
    quality_index: A
    references: http://www.oceansites.org, http://www.myocean.eu.org/, http://www.puertos.es, http://www.puertos.es
    comment:  
    conventions: OceanSITES Manual 1.2, InSituTac-Specification-Document
    netcdf_version: 3.5
    title:  
    summary:  
    naming_authority: OceanSITES
    id: IR_TS_MO_61198
    cdm_data_type: Time-series
    area: North Atlantic Ocean
    geospatial_lat_min: 25.0
    geospatial_lat_max: 37.43
    geospatial_lon_min: -20.0
    geospatial_lon_max: -0.75
    geospatial_vertical_min:  
    geospatial_vertical_max:  
    time_coverage_start: 1998-03-27T21:00:00Z
    time_coverage_end: 2015-08-02T09:00:00Z
    institution_references: http://www.puertos.es
    contact: mar@puertos.es, mar@puertos.es
    author: Marta de Alfonso
    data_assembly_center: Puertos del Estado
    pi_name: PdE
    distribution_statement: These data follow MyOcean standards; they are public and free of charge. User assumes all risk for use of data. User must display citation in any publication or product using data. User must contact PI prior to any commercial use of data. More on: http://www.myocean.eu/data_policy
    citation: These data were collected and made freely available by the MyOcean project and the programs that contribute to it
    update_interval: yearly
    qc_manual: OceanSITES User's Manual v1.1
    dimensions(sizes): TIME(119916), DEPTH(1), LATITUDE(119916), LONGITUDE(119916), POSITION(119916)
    variables(dimensions): float64 TIME(TIME), int8 TIME_QC(TIME), float32 LATITUDE(LATITUDE), float32 LONGITUDE(LONGITUDE), int8 POSITION_QC(POSITION), float32 DEPH(TIME,DEPTH), int8 DEPH_QC(TIME,DEPTH), |S1 DEPH_DM(TIME,DEPTH), float32 VTDH(TIME,DEPTH), int8 VTDH_QC(TIME,DEPTH), |S1 VTDH_DM(TIME,DEPTH), float32 VTZA(TIME,DEPTH), int8 VTZA_QC(TIME,DEPTH), |S1 VTZA_DM(TIME,DEPTH), float32 VDIR(TIME,DEPTH), int8 VDIR_QC(TIME,DEPTH), |S1 VDIR_DM(TIME,DEPTH), float32 ATMS(TIME,DEPTH), int8 ATMS_QC(TIME,DEPTH), |S1 ATMS_DM(TIME,DEPTH), float32 DRYT(TIME,DEPTH), int8 DRYT_QC(TIME,DEPTH), |S1 DRYT_DM(TIME,DEPTH), float32 WSPD(TIME,DEPTH), int8 WSPD_QC(TIME,DEPTH), |S1 WSPD_DM(TIME,DEPTH), float32 WDIR(TIME,DEPTH), int8 WDIR_QC(TIME,DEPTH), |S1 WDIR_DM(TIME,DEPTH), float32 HCSP(TIME,DEPTH), int8 HCSP_QC(TIME,DEPTH), |S1 HCSP_DM(TIME,DEPTH), float32 HCDT(TIME,DEPTH), int8 HCDT_QC(TIME,DEPTH), |S1 HCDT_DM(TIME,DEPTH), float32 TEMP(TIME,DEPTH), int8 TEMP_QC(TIME,DEPTH), |S1 TEMP_DM(TIME,DEPTH), float32 PSAL(TIME,DEPTH), int8 PSAL_QC(TIME,DEPTH), |S1 PSAL_DM(TIME,DEPTH)
    groups: 

We can access the global attributes individually:


In [9]:
print 'Institution: ' + ds.institution
print 'Reference: ' + ds.institution_references


Institution: Puertos del Estado (Spain)
Reference: http://www.puertos.es

Data

Now we want to load some of the variables: we use the ds.variables


In [10]:
time = ds.variables['TIME']
temperature = ds.variables['TEMP']

Let's examine the variable temperature


In [11]:
temperature


Out[11]:
<type 'netCDF4._netCDF4.Variable'>
float32 TEMP(TIME, DEPTH)
    long_name: Sea temperature
    standard_name: sea_water_temperature
    units: degree_Celsius
    _FillValue: 99999.0
    QC_procedure: 1
    valid_min: 0.0
    valid_max: 31.0
    comment:  
    sensor_depth: 3.0
    ancillary_variables: TEMP_QC
    sensor_mount:  
    sensor_orientation:  
    DM_indicator: D
unlimited dimensions: TIME
current shape = (119916, 1)
filling off

This means that the variable depends on two dimensions: time and depth. We also know the long_name, standard_name, units, and other useful pieces of information concerning the temperature.

To get the values corresponding to the variables, the synthax is:


In [12]:
temperature_values = temperature[:]
time_values = time[:]

To get the variable attributes:


In [13]:
print 'Time units: ' + time.units
print 'Temperature units: ' + temperature.units


Time units: days since 1950-01-01T00:00:00Z
Temperature units: degree_Celsius

Quality flags

Just a quick plot to see everything is fine. More details about the plots will be given later.


In [15]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(temperature)
plt.show()


It seems that we have not taken into accound the quality flags of the data. We can load the corresponding variable TEMP_QC.


In [17]:
temperatureQC = ds.variables['TEMP_QC']
plt.plot(temperatureQC[:])
plt.show()


The meaning of the quality flags is also stored in the file.


In [18]:
print 'Flag values: ' + str(temperatureQC.flag_values)
print 'Flag meanings: ' + temperatureQC.flag_meanings


Flag values: [0 1 2 3 4 5 6 7 8 9]
Flag meanings: no_qc_performed good_data probably_good_data bad_data_that_are_potentially_correctable bad_data value_changed not_used nominal_value interpolated_value missing_value

Now we will generate a new plot of the time series using only data with QF = 1.


In [19]:
plt.plot(temperature[(temperatureQC[:, 0] == 1), 0])
plt.show()


The resulting plot now seems correct, with values ranging roughly between 10 and 28ºC.

Last thing to remember: close the netCDF file!


In [43]:
nc.close()