ABSTRACT

All CMEMS in situ data products can be found and downloaded after registration via CMEMS catalogue.

Such channel is advisable just for sporadic netCDF donwloading because when operational, interaction with the web user interface is not practical. In this context though, the use of scripts for ftp file transference is is a much more advisable approach.

As long as every line of such files contains information about the netCDFs contained within the different directories see at tips why, it is posible for users to loop over its lines to download only those that matches a number of specifications such as spatial coverage, time coverage, provider, data_mode, parameters or file_name related (region, data type, TS or PF, platform code, or/and platform category, timestamp).

PREREQUISITES

i.e:



In [1]:

    
user = ''  #type CMEMS user name within colons
password = ''#type CMEMS password within colons
product_name = 'INSITU_BAL_NRT_OBSERVATIONS_013_032' #type aimed CMEMS in situ product 
distribution_unit = 'cmems.smhi.se' #type aimed hosting institution
index_file = 'index_latest.txt' #type aimed index file name

DOWNLOAD

Index file download



In [2]:

    
import ftplib



In [3]:

    
ftp=ftplib.FTP(distribution_unit,user,password) 
ftp.cwd("Core")
ftp.cwd(product_name) 
remote_filename= index_file
local_filename = remote_filename
local_file = open(local_filename, 'wb')
ftp.retrbinary('RETR ' + remote_filename, local_file.write)
local_file.close()
ftp.quit()
#ready when 221 Goodbye.!









    Out[3]:





'221 Goodbye.'

QUICK VIEW

Reading a random line of the index file to know more about the information it contains.



In [4]:

    
import numpy as np
import pandas as pd
from random import randint



In [5]:

    
index = np.genfromtxt(index_file, skip_header=6, unpack=False, delimiter=',', dtype=None,
           names=['catalog_id', 'file_name', 'geospatial_lat_min', 'geospatial_lat_max',
                     'geospatial_lon_min', 'geospatial_lon_max',
                     'time_coverage_start', 'time_coverage_end', 
                     'provider', 'date_update', 'data_mode', 'parameters'])



In [6]:

    
dataset = randint(0,len(index)) #ramdom line of the index file



In [7]:

    
values = [index[dataset]['catalog_id'], '<a href='+index[dataset]['file_name']+'>'+index[dataset]['file_name']+'</a>', index[dataset]['geospatial_lat_min'], index[dataset]['geospatial_lat_max'],
                 index[dataset]['geospatial_lon_min'], index[dataset]['geospatial_lon_max'], index[dataset]['time_coverage_start'],
                 index[dataset]['time_coverage_end'], index[dataset]['provider'], index[dataset]['date_update'], index[dataset]['data_mode'],
                 index[dataset]['parameters']]
headers = ['catalog_id', 'file_name', 'geospatial_lat_min', 'geospatial_lat_max',
                     'geospatial_lon_min', 'geospatial_lon_max',
                     'time_coverage_start', 'time_coverage_end', 
                     'provider', 'date_update', 'data_mode', 'parameters']
df = pd.DataFrame(values, index=headers, columns=[dataset])
df.style









    Out[7]:





        

        
        

        
            
            
                
                
                
                  
                
                
                
                
                  939
                
                
            
            
        
        
            
            
                
                
                
                    catalog_id
                
                
                
                
                    COP-BO-01
                
                
            
            
            
                
                
                
                    file_name
                
                
                
                
                    ftp://cmems.smhi.se/Core/INSITU_BAL_NRT_OBSERVATIONS_013_032/latest/20170915/BO_LATEST_TS_FB_StenaHollandica_20170915.nc
                
                
            
            
            
                
                
                
                    geospatial_lat_min
                
                
                
                
                    51.8977
                
                
            
            
            
                
                
                
                    geospatial_lat_max
                
                
                
                
                    52.0519
                
                
            
            
            
                
                
                
                    geospatial_lon_min
                
                
                
                
                    1.2520
                
                
            
            
            
                
                
                
                    geospatial_lon_max
                
                
                
                
                    4.1337
                
                
            
            
            
                
                
                
                    time_coverage_start
                
                
                
                
                    2017-09-15T02:00:00Z
                
                
            
            
            
                
                
                
                    time_coverage_end
                
                
                
                
                    2017-09-15T23:50:00Z
                
                
            
            
            
                
                
                
                    provider
                
                
                
                
                    STE
                
                
            
            
            
                
                
                
                    date_update
                
                
                
                
                    2017-09-17T13:01:08Z
                
                
            
            
            
                
                
                
                    data_mode
                
                
                
                
                    R
                
                
            
            
            
                
                
                
                    parameters
                
                
                
                
                    TEMP WDIR HCSP HCDT WSPD DEPH

FILTERING CRITERIA

Regarding the above glimpse, it is posible to filter by 12 criteria. As example we will setup next a filter to only download those files that has been updated in the last X hours.



In [110]:

    
#packages
from datetime import date
import datetime

1. Number of minutes



In [111]:

    
time_lapse = 10 #all files updated within the last 10 hours will be downloaded
end_date = datetime.datetime.today()
ini_date = end_date - datetime.timedelta(hours=time_lapse)

2. netCDF filtering/selection



In [113]:

    
#read file lines (iterate over them)
selected_netCDFs = [];
for netCDF in index: 
    file_name = netCDF['file_name'].decode('utf-8')
    last_idx_slash = file_name.rfind('/')
    ncdf_file_name = file_name[last_idx_slash+1:]
    date_update = netCDF['date_update'].decode('utf-8')
    date_format = "%Y-%m-%dT%H:%M:%SZ" 
    file_date = datetime.datetime.strptime(date_update, date_format)
    #set up a selection criteria: i.e dates within the last 30 min
    if ini_date < file_date < end_date:
        selected_netCDFs.append(file_name)
print("total: " +str(len(selected_netCDFs)))









    



total: 333

SELECTION DOWNLOAD



In [ ]:

    
for nc in selected_netCDFs:
    last_idx_slash = nc.rfind('/')
    ncdf_file_name = nc[last_idx_slash+1:]
    folders = nc.split('/')[3:len(file_name.split('/'))-1]
    host = nc.split('/')[2] #distribution unit
    
    ftp=ftplib.FTP(host,user,password) 
    for folder in folders:
        ftp.cwd(folder)
    
    local_file = open(ncdf_file_name, 'wb')
    ftp.retrbinary('RETR '+ncdf_file_name, local_file.write)
    local_file.close()
    ftp.quit()

	939
catalog_id	COP-BO-01
file_name	ftp://cmems.smhi.se/Core/INSITU_BAL_NRT_OBSERVATIONS_013_032/latest/20170915/BO_LATEST_TS_FB_StenaHollandica_20170915.nc
geospatial_lat_min	51.8977
geospatial_lat_max	52.0519
geospatial_lon_min	1.2520
geospatial_lon_max	4.1337
time_coverage_start	2017-09-15T02:00:00Z
time_coverage_end	2017-09-15T23:50:00Z
provider	STE
date_update	2017-09-17T13:01:08Z
data_mode	R
parameters	TEMP WDIR HCSP HCDT WSPD DEPH