pySeabird is a package to parse/load CTD data files. It should be an easy task but the problem is that the format have been changing along the time. Work with multiple ships/cruises data requires first to understand each file, to normalize it into a common format for only than start your analysis. That can still be done with few general regular expression rules, but I would rather use strict rules. If I'm loading hundreds or thousands of profiles, I want to be sure that no mistake passed by. I rather ignore a file in doubt and warn it, than belive that it was loaded right and be part of my analysis.
With that in mind, I wrote this package with the ability to load multiple rules, so new rules can be added without change the main engine.
For more information, check the documentatio
In [1]:
%matplotlib inline
from seabird.cnv import fCNV
Let's first download an example file with some CTD data
In [2]:
!wget https://raw.githubusercontent.com/castelao/seabird/master/sampledata/CTD/dPIRX003.cnv
In [3]:
profile = fCNV('dPIRX003.cnv')
The profile dPIRX003.cnv.OK was loaded with the default rule cnv.yaml
The header is loaded into the .attributes as a dictionary. Note that the date was already converted into a datetime object.
There is a new attribute, not found in the file, that is 'md5'. This is the MD5 Hash for the original file. This might be usefull to double check the inputs when reproducing some analysis.
Since it's a dictionary, to extract the geographical coordinates, for example:
In [4]:
print ("The profile coordinates is latitude: %.4f, and longitude: %.4f" % \
(profile.attributes['LATITUDE'], profile.attributes['LONGITUDE']))
Or for an overview of all the attributes and data:
In [5]:
print("Header: %s" % profile.attributes.keys())
print(profile.attributes)
The object profile behaves like a dictionary with the data. So to check the available data one can just
In [6]:
print(profile.keys())
Each data returns as a masked array, hence all values equal to profile.attributes['bad_flag'] will return as a masked value
In [7]:
profile['TEMP2'][:25]
Out[7]:
As a regular masked array, let's check the mean and standard deviation between the two temperature sensors
In [8]:
print(profile['TEMP'].mean(), profile['TEMP'].std())
print(profile['TEMP2'].mean(), profile['TEMP2'].std())
In [9]:
from matplotlib import pyplot as plt
plt.plot(profile['TEMP'], profile['PRES'],'b')
plt.plot(profile['TEMP2'], profile['PRES'],'g')
plt.gca().invert_yaxis()
plt.xlabel('temperature')
plt.ylabel('pressure [dbar]')
plt.title(profile.attributes['filename'])
Out[9]:
We can also export the data into a pandas DataFrame for easier data manipulation later on:
In [10]:
df = profile.as_DataFrame()
df.head()
Out[10]: