In [ ]:
!jupyter nbconvert --to='slides' asdf\ tutorial.ipynb --post serve
Paper - ASDF: A new data format for astronomy http://dx.doi.org/10.1016/j.ascom.2015.06.004
Shortcomings of FITS are becoming burdensome:
Strengths: XML, can leverage correspoonding reader and validation libraies.
Weaknesses
Strengths
Weaknesses (relative to FITS)
Structure
Qualities for numerical Data
One-line header indicating asdf and version
YAML "tree": Yet Another Markup Language (YAML)
Binary Sections: Defined by supportin yaml schemas
Kameleon is known as a data standard for the space weather community. Operationally, that means we
At the time the CCMC started, we chose to provide a common interface into the model results and to add our own metadata to the hosted results. There are a few problems with this approach
For years, Lutz has been developing IDL readers and interpolators for all models currently in the Runs-on-Request system. Lutz's scripts convert the model results into a general N-D data structure, which his interpolators understand...
Idea: use Lutz's format, but store results in ASDF .
Our goal is to make it easy to bring space weather models into kameleon. We do this by either:
Since Lutz has already created a format for reading the models into IDL, we are targeting a file format for him to write to.
https://asdf.readthedocs.io/en/latest/
In [2]:
mkdir -p hello_world
In [3]:
from asdf import AsdfFile
# Make the tree structure, and create a AsdfFile from it.
tree = {'hello': 'world'}
ff = AsdfFile(tree)
ff.write_to("hello_world/test.asdf")
# You can also make the AsdfFile first, and modify its tree directly:
ff = AsdfFile()
ff.tree['hello'] = 'world'
ff.write_to("hello_world/test_hello_world.asdf")
In [4]:
cat hello_world/test.asdf
In [17]:
cat hello_world/test_hello_world.asdf
In [ ]:
mkdir -p array_storage
In [ ]:
from asdf import AsdfFile
import numpy as np
tree = {'my_array': np.random.rand(8, 8)}
ff = AsdfFile(tree)
ff.write_to("array_storage/test_arrays.asdf")
In [18]:
cat array_storage/test_arrays.asdf
In [ ]:
from asdf import ValidationError
In [ ]:
from asdf import AsdfFile
tree = {'data': 'Not an array'}
try:
AsdfFile(tree)
except:
raise ValidationError('data needs an array!')
In [ ]:
mkdir -p data_sharing
In [ ]:
from asdf import AsdfFile
import numpy as np
my_array = np.random.rand(8, 8)
subset = my_array[2:4,3:6]
tree = {
'my_array': my_array,
'subset': subset
}
ff = AsdfFile(tree)
ff.write_to("data_sharing/test_overlap.asdf")
In [20]:
cat data_sharing/test_overlap.asdf
In [ ]:
mkdir -p streaming_data
In [31]:
from asdf import AsdfFile, Stream
import numpy as np
tree = {
# Each "row" of data will have 128 entries.
'my_stream': Stream([128], np.float64)
}
ff = AsdfFile(tree)
with open('streaming_data/stream_test.asdf', 'wb') as fd:
ff.write_to(fd)
# Write 100 rows of data, one row at a time. ``write``
# expects the raw binary bytes, not an array, so we use
# ``tostring()``.
for i in range(10):
fd.write(np.array([i] * 128, np.float64).tostring())
In [32]:
cat streaming_data/stream_test.asdf
In [ ]:
mkdir -p exploded_data
In [ ]:
from asdf import AsdfFile
import numpy as np
my_array = np.random.rand(3, 4)
tree = {'my_array': my_array}
my_big_array = np.random.rand(8, 8)
tree['my_big_array'] = my_big_array
ff = AsdfFile(tree)
ff.set_array_storage(my_array, 'inline')
ff.set_array_storage(my_big_array, 'external')
ff.write_to("exploded_data/test_exploded.asdf")
# Or for every block:
# ff.write_to("test.asdf", all_array_storage='external')
In [34]:
ls exploded_data/
In [35]:
cat exploded_data/test_exploded.asdf
In [36]:
cat exploded_data/test_exploded0000.asdf
In [ ]:
mkdir -p provenance
In [37]:
from asdf import AsdfFile
import numpy as np
tree = {
'some_random_data': np.random.rand(5, 5)
}
ff = AsdfFile(tree)
ff.add_history_entry(
u"Initial random numbers",
{u'name': u'asdf examples',
u'author': u'John Q. Public',
u'homepage': u'http://github.com/spacetelescope/asdf',
u'version': u'0.1',
u'spase_dict': {u'resource_id': 5}})
ff.write_to('provenance/provenance.asdf')
In [38]:
cat provenance/provenance.asdf
In [ ]:
mkdir -p compression
In [ ]:
from asdf import AsdfFile
import numpy as np
x = np.linspace(-20, 20, 30)
y = np.linspace(-30, 30, 50)
xx,yy = np.meshgrid(x,y)
tree = dict(variables = dict(x = xx,
y = yy
)
)
ff = AsdfFile(tree)
ff.write_to("compression/uncompressed_data.asdf", all_array_compression=None)
ff.write_to("compression/compressed_data.asdf", all_array_compression='bzp2')
In [ ]:
import os
print 'uncompressed:', os.path.getsize("compression/uncompressed_data.asdf"), 'bytes'
print 'compressed (bz2):', os.path.getsize("compression/compressed_data.asdf"), 'bytes'
In [ ]:
mkdir -p time
In [48]:
from asdf import AsdfFile
from astropy.time import Time
astrot = Time('2016-10-3')
from asdf.tags.time import TimeType
tree = {'my_time': astrot}
ff = AsdfFile(tree)
ff.write_to("time/test_time.asdf")
ff.close()
In [49]:
cat time/test_time.asdf
verify that time matches astropy type
In [53]:
sample_time = AsdfFile.open('time/test_time.asdf')
my_time = sample_time.tree['my_time']
type(my_time) == type(astrot)
Out[53]:
In [ ]:
mkdir -p units
In [65]:
from astropy import units as u
rho_unit = u.kg*u.cm**-3
density = np.linspace(0, 11, 5)*rho_unit
density.unit
Out[65]:
In [71]:
from asdf import AsdfFile
tree = dict(variables=dict(density = dict(data=density.value, unit = density.unit)))
ff = AsdfFile(tree)
ff.set_array_storage(density, 'inline')
ff.write_to("units/units_test.asdf", all_array_compression=None)
ff.close()
In [72]:
cat units/units_test.asdf
verify variables load - comes as dictionary
In [75]:
units_file = AsdfFile.open('units/units_test.asdf')
rho = units_file.tree['variables']['density']
rho
Out[75]:
It would be nice to define a schema for quantities, so that they would load as astropy arrays with units!
ASDF is designed to be extensible so outside teams can add their own types and structures while retaining compatibility with tools that don’t understand those conventions.
https://github.com/STScI-JWST/jwst/tree/master/jwst/datamodels/schemas