A module to extend the python json package functionality:
Treat a directory structure like a nested dictionary:
lightweight plugin system: define bespoke classes for parsing different file extensions and encoding/decoding objects
lazy loading: read files only when they are indexed into
tab completion: index as tabs for quick exploration of directory
Manipulation of nested dictionaries:
enhanced pretty printer
Javascript rendered, expandable tree in the Jupyter Notebook
functions including; filter, merge, flatten, unflatten, diff
output to directory structure (of n folder levels)
On-disk indexing option for large json files (using the ijson package)
Units schema concept to apply and convert physical units (using the pint package)
In [1]:
from jsonextended import edict, plugins, example_mockpaths
Take a directory structure, potentially containing multiple file types:
In [2]:
datadir = example_mockpaths.directory1
print(datadir.to_string(indentlvl=3,file_content=True,color=True))
Plugins can be defined for parsing each file type (see Creating Plugins section):
In [3]:
plugins.load_builtin_plugins('parsers')
plugins.view_plugins('parsers')
Out[3]:
LazyLoad then takes a path name, path-like object or dict-like object, which will lazily load each file with a compatible plugin.
In [4]:
lazy = edict.LazyLoad(datadir)
lazy
Out[4]:
Lazyload can then be treated like a dictionary, or indexed by tab completion:
In [5]:
list(lazy.keys())
Out[5]:
In [6]:
lazy[['file1.json','key1']]
Out[6]:
In [7]:
lazy.subdir1.file1_literal_csv.header2
Out[7]:
For pretty printing of the dictionary:
In [9]:
edict.pprint(lazy,depth=2,keycolor='green')
Numerous functions exist to manipulate the nested dictionary:
In [9]:
edict.flatten(lazy.subdir1)
Out[9]:
LazyLoad parses the plugins.decode function to parser plugin's read_file method (keyword 'object_hook'). Therefore, bespoke decoder plugins can be set up for specific dictionary key signatures:
In [10]:
print(example_mockpaths.jsonfile2.to_string())
In [11]:
edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
Out[11]:
In [12]:
plugins.load_builtin_plugins('decoders')
plugins.view_plugins('decoders')
Out[12]:
In [13]:
dct = edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
dct
Out[13]:
This process can be reversed, using encoder plugins:
In [14]:
plugins.load_builtin_plugins('encoders')
plugins.view_plugins('encoders')
Out[14]:
In [15]:
import json
json.dumps(dct,default=plugins.encode)
Out[15]:
pip install jsonextended
jsonextended has no import dependancies, on Python 3.x and only pathlib2 on 2.7 but,
for full functionallity, it is advised to install the following packages:
conda install -c conda-forge ijson numpy pint
In [16]:
from jsonextended import plugins, utils
Plugins are recognised as classes with a minimal set of attributes matching the plugin category interface:
In [17]:
plugins.view_interfaces()
Out[17]:
In [18]:
plugins.unload_all_plugins()
plugins.view_plugins()
Out[18]:
For example, a simple parser plugin would be:
In [19]:
class ParserPlugin(object):
plugin_name = 'example'
plugin_descript = 'a parser for *.example files, that outputs (line_number:line)'
file_regex = '*.example'
def read_file(self, file_obj, **kwargs):
out_dict = {}
for i, line in enumerate(file_obj):
out_dict[i] = line.strip()
return out_dict
Plugins can be loaded as a class:
In [20]:
plugins.load_plugin_classes([ParserPlugin],'parsers')
plugins.view_plugins()
Out[20]:
Or by directory (loading all .py files):
In [21]:
fobj = utils.MockPath('example.py',is_file=True,content="""
class ParserPlugin(object):
plugin_name = 'example.other'
plugin_descript = 'a parser for *.example.other files, that outputs (line_number:line)'
file_regex = '*.example.other'
def read_file(self, file_obj, **kwargs):
out_dict = {}
for i, line in enumerate(file_obj):
out_dict[i] = line.strip()
return out_dict
""")
dobj = utils.MockPath(structure=[fobj])
plugins.load_plugins_dir(dobj,'parsers')
plugins.view_plugins()
Out[21]:
For a more complex example of a parser, see jsonextended.complex_parsers
Parsers:
Decoders:
plugins.decode function will use the method denoted by the intype parameter, e.g. if intype='json', then from_json will be called.Encoders:
plugins.encode function will use the method denoted by the outtype parameter, e.g. if outtype='json', then to_json will be called.For more information, all functions contain docstrings with tested examples.
In [22]:
from jsonextended import ejson, edict, utils
In [23]:
path = utils.get_test_path()
ejson.jkeys(path)
Out[23]:
In [24]:
jdict1 = ejson.to_dict(path)
edict.pprint(jdict1,depth=2)
In [ ]:
edict.to_html(jdict1,depth=2)
To try the rendered JSON tree, output in the Jupyter Notebook, go to : https://chrisjsewell.github.io/
In [26]:
jdict2 = ejson.to_dict(path,['dir1','file1'])
edict.pprint(jdict2,depth=1)
In [27]:
filtered = edict.filter_keys(jdict2,['vol*'],use_wildcards=True)
edict.pprint(filtered)
In [28]:
edict.pprint(edict.flatten(filtered))
In [29]:
from jsonextended.units import apply_unitschema, split_quantities
withunits = apply_unitschema(filtered,{'volume':'angstrom^3'})
edict.pprint(withunits)
In [30]:
newunits = apply_unitschema(withunits,{'volume':'nm^3'})
edict.pprint(newunits)
In [31]:
edict.pprint(split_quantities(newunits),depth=4)