ABSTRACT

All CMEMS in situ data products can be found and downloaded after registration via CMEMS catalogue.

Such channel is advisable just for sporadic netCDF donwloading because when operational, interaction with the web user interface is not practical. In this context though, the use of scripts for ftp file transference is is a much more advisable approach.

As long as every line of such files contains information about the netCDFs contained within the different directories see at tips why, it is posible for users to loop over its lines to download only those that matches a number of specifications such as spatial coverage, time coverage, provider, data_mode, parameters or file_name related (region, data type, TS or PF, platform code, or/and platform category, timestamp).

PREREQUISITES


In [3]:
user = ''  #type CMEMS user name within colons
password = ''#type CMEMS password within colons
product_name = 'INSITU_BAL_NRT_OBSERVATIONS_013_032' #type aimed CMEMS in situ product 
distribution_unit = 'cmems.smhi.se' #type aimed hosting institution
index_file = 'index_latest.txt' #type aimed index file name

DOWNLOAD

  1. Index file download

In [4]:
import ftplib

In [5]:
ftp=ftplib.FTP(distribution_unit,user,password) 
ftp.cwd("Core")
ftp.cwd(product_name) 
remote_filename= index_file
local_filename = remote_filename
local_file = open(local_filename, 'wb')
ftp.retrbinary('RETR ' + remote_filename, local_file.write)
local_file.close()
ftp.quit()
#ready when 221 Goodbye.!


Out[5]:
'221 Goodbye.'

QUICK VIEW

Reading a random line of the index file to know more about the information it contains.


In [6]:
import numpy as np
import pandas as pd
from random import randint

In [7]:
index = np.genfromtxt(index_file, skip_header=6, unpack=False, delimiter=',', dtype=None,
           names=['catalog_id', 'file_name', 'geospatial_lat_min', 'geospatial_lat_max',
                     'geospatial_lon_min', 'geospatial_lon_max',
                     'time_coverage_start', 'time_coverage_end', 
                     'provider', 'date_update', 'data_mode', 'parameters'])

In [8]:
dataset = randint(0,len(index)) #ramdom line of the index file

In [9]:
values = [index[dataset]['catalog_id'], '<a href='+index[dataset]['file_name']+'>'+index[dataset]['file_name']+'</a>', index[dataset]['geospatial_lat_min'], index[dataset]['geospatial_lat_max'],
                 index[dataset]['geospatial_lon_min'], index[dataset]['geospatial_lon_max'], index[dataset]['time_coverage_start'],
                 index[dataset]['time_coverage_end'], index[dataset]['provider'], index[dataset]['date_update'], index[dataset]['data_mode'],
                 index[dataset]['parameters']]
headers = ['catalog_id', 'file_name', 'geospatial_lat_min', 'geospatial_lat_max',
                     'geospatial_lon_min', 'geospatial_lon_max',
                     'time_coverage_start', 'time_coverage_end', 
                     'provider', 'date_update', 'data_mode', 'parameters']
df = pd.DataFrame(values, index=headers, columns=[dataset])
df.style


Out[9]:
2175
catalog_id COP-BO-01
file_name ftp://cmems.smhi.se/Core/INSITU_BAL_NRT_OBSERVATIONS_013_032/latest/20170920/BO_LATEST_TS_MO_Raahe_20170920.nc
geospatial_lat_min 64.6659
geospatial_lat_max 64.6659
geospatial_lon_min 24.4071
geospatial_lon_max 24.4071
time_coverage_start 2017-09-20T00:00:00Z
time_coverage_end 2017-09-20T23:00:00Z
provider FMI
date_update 2017-09-21T07:01:51Z
data_mode R
parameters SLEV DEPH

FILTERING CRITERIA

Regarding the above glimpse, it is posible to filter by 12 criteria. As example we will setup next a filter to only download those files that contains an data_type.

1. Aimed data_type 

In [11]:
variable = 'MO'
2. netCDF filtering/selection

Remember that depending on the folder we are, the data_type tag in the file_name is at an specific place: for history directory is at position 2 and for monthly and latest directory it would be at position 3. See more at naming convention or PUM


In [12]:
#read file lines (iterate over them)
selected_netCDFs = [];
for netCDF in index: 
    file_name = netCDF['file_name'].decode('utf-8')
    last_idx_slash = file_name.rfind('/')
    ncdf_file_name = file_name[last_idx_slash+1:]
    
    #set up a selection criteria: i.e specific data_type
    #history: position 2
    #monthly and latest: position 3
    position = 3
    data_type = ncdf_file_name.split('_')[position] #index_latest (above)
    aimed_data_type = 'MO'#choose a data_type: i.e mooring
    
    if data_type == aimed_data_type :
        selected_netCDFs.append(file_name)
print("total: " +str(len(selected_netCDFs)))


total: 4523

SELECTION DOWNLOAD


In [ ]:
for nc in selected_netCDFs:
    last_idx_slash = file_name.rfind('/')
    ncdf_file_name = file_name[last_idx_slash+1:]
    folders = file_name.split('/')[3:len(file_name.split('/'))-1]
    host = file_name.split('/')[2] #distribution unit
    
    ftp=ftplib.FTP(host,user,password) 
    for folder in folders:
    ftp.cwd(folder)
    
    local_file = open(ncdf_file_name, 'wb')
    ftp.retrbinary('RETR '+ncdf_file_name, local_file.write)
    local_file.close()
    ftp.quit()