JSON Extended¶
A module to extend the python json package functionality:
- Treat a directory structure like a nested dictionary
- Lightweight plugin system: define bespoke classes for parsing different file extensions (in-the-box: .json, .csv, .hdf5) and encoding/decoding objects.
- Lazy Loading: read files only when they are indexed into
- Tab Completion: index as tabs for quick exploration of data
- Manipulation of nested structures, including; filter, merge, diff, flatten, unflatten
- Enhanced pretty printer
- Output to directory structure (of n folder levels)
- On-disk indexing option for large json files (using the ijson package)
- Units schema concept to apply and convert physical quantities (using the pint package)
- All functions are thoroughly documented with tested examples.
Installation¶
From Conda (recommended):
conda install -c conda-forge jsonextended
From PyPi:
pip install jsonextended
jsonextended has no import dependancies, on Python 3.x and only
pathlib2
on 2.7 but, for full functionality, it is advised to
install the following packages:
conda install -c conda-forge ijson numpy pint h5py pandas
Basic Example¶
from jsonextended import edict, plugins, example_mockpaths
Take a directory structure, potentially containing multiple file types:
datadir = example_mockpaths.directory1
print(datadir.to_string(indentlvl=3,file_content=True))
Folder("dir1")
File("file1.json") Contents:
{"key2": {"key3": 4, "key4": 5}, "key1": [1, 2, 3]}
Folder("subdir1")
File("file1.csv") Contents:
# a csv file
header1,header2,header3
val1,val2,val3
val4,val5,val6
val7,val8,val9
File("file1.literal.csv") Contents:
# a csv file with numbers
header1,header2,header3
1,1.1,string1
2,2.2,string2
3,3.3,string3
Folder("subdir2")
Folder("subsubdir21")
File("file1.keypair") Contents:
# a key-pair file
key1 val1
key2 val2
key3 val3
key4 val4
Plugins can be defined for parsing each file type (see Creating Plugins section):
plugins.load_builtin_plugins('parsers')
plugins.view_plugins('parsers')
{'csv.basic': 'read *.csv delimited file with headers to {header:[column_values]}',
'csv.literal': 'read *.literal.csv delimited files with headers to {header:column_values}, with number strings converted to int/float',
'hdf5.read': 'read *.hdf5 (in read mode) files using h5py',
'json.basic': 'read *.json files using json.load',
'keypair': "read *.keypair, where each line should be; '<key> <pair>'"}
LazyLoad then takes a path name, path-like object or dict-like object, which will lazily load each file with a compatible plugin.
lazy = edict.LazyLoad(datadir)
lazy
{file1.json:..,subdir1:..,subdir2:..}
Lazyload can then be treated like a dictionary, or indexed by tab completion:
list(lazy.keys())
['subdir1', 'subdir2', 'file1.json']
lazy[['file1.json','key1']]
[1, 2, 3]
lazy.subdir1.file1_literal_csv.header2
[1.1, 2.2, 3.3]
For pretty printing of the dictionary:
edict.pprint(lazy,depth=2)
file1.json:
key1: [1, 2, 3]
key2: {...}
subdir1:
file1.csv: {...}
file1.literal.csv: {...}
subdir2:
subsubdir21: {...}
Numerous functions exist to manipulate the nested dictionary:
edict.flatten(lazy.subdir1)
{('file1.csv', 'header1'): ['val1', 'val4', 'val7'],
('file1.csv', 'header2'): ['val2', 'val5', 'val8'],
('file1.csv', 'header3'): ['val3', 'val6', 'val9'],
('file1.literal.csv', 'header1'): [1, 2, 3],
('file1.literal.csv', 'header2'): [1.1, 2.2, 3.3],
('file1.literal.csv', 'header3'): ['string1', 'string2', 'string3']}
LazyLoad parses the plugins.decode
function to parser plugin’s
read_file
method (keyword ‘object_hook’). Therefore, bespoke
decoder plugins can be set up for specific dictionary key signatures:
print(example_mockpaths.jsonfile2.to_string())
File("file2.json") Contents:
{"key1":{"_python_set_": [1, 2, 3]},"key2":{"_numpy_ndarray_": {"dtype": "int64", "value": [1, 2, 3]}}}
edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
{u'key1': {u'_python_set_': [1, 2, 3]},
u'key2': {u'_numpy_ndarray_': {u'dtype': u'int64', u'value': [1, 2, 3]}}}
plugins.load_builtin_plugins('decoders')
plugins.view_plugins('decoders')
{'decimal.Decimal': 'encode/decode Decimal type',
'numpy.ndarray': 'encode/decode numpy.ndarray',
'pint.Quantity': 'encode/decode pint.Quantity object',
'python.set': 'decode/encode python set'}
dct = edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
dct
{u'key1': {1, 2, 3}, u'key2': array([1, 2, 3])}
This process can be reversed, using encoder plugins:
plugins.load_builtin_plugins('encoders')
plugins.view_plugins('encoders')
{'decimal.Decimal': 'encode/decode Decimal type',
'numpy.ndarray': 'encode/decode numpy.ndarray',
'pint.Quantity': 'encode/decode pint.Quantity object',
'python.set': 'decode/encode python set'}
import json
json.dumps(dct,default=plugins.encode)
'{"key2": {"_numpy_ndarray_": {"dtype": "int64", "value": [1, 2, 3]}}, "key1": {"_python_set_": [1, 2, 3]}}'
Creating and Loading Plugins¶
from jsonextended import plugins, utils
Plugins are recognised as classes with a minimal set of attributes matching the plugin category interface:
plugins.view_interfaces()
{'decoders': ['plugin_name', 'plugin_descript', 'dict_signature'],
'encoders': ['plugin_name', 'plugin_descript', 'objclass'],
'parsers': ['plugin_name', 'plugin_descript', 'file_regex', 'read_file']}
plugins.unload_all_plugins()
plugins.view_plugins()
{'decoders': {}, 'encoders': {}, 'parsers': {}}
For example, a simple parser plugin would be:
class ParserPlugin(object):
plugin_name = 'example'
plugin_descript = 'a parser for *.example files, that outputs (line_number:line)'
file_regex = '*.example'
def read_file(self, file_obj, **kwargs):
out_dict = {}
for i, line in enumerate(file_obj):
out_dict[i] = line.strip()
return out_dict
Plugins can be loaded as a class:
plugins.load_plugin_classes([ParserPlugin],'parsers')
plugins.view_plugins()
{'decoders': {},
'encoders': {},
'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)'}}
Or by directory (loading all .py files):
fobj = utils.MockPath('example.py',is_file=True,content="""
class ParserPlugin(object):
plugin_name = 'example.other'
plugin_descript = 'a parser for *.example.other files, that outputs (line_number:line)'
file_regex = '*.example.other'
def read_file(self, file_obj, **kwargs):
out_dict = {}
for i, line in enumerate(file_obj):
out_dict[i] = line.strip()
return out_dict
""")
dobj = utils.MockPath(structure=[fobj])
plugins.load_plugins_dir(dobj,'parsers')
plugins.view_plugins()
{'decoders': {},
'encoders': {},
'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)',
'example.other': 'a parser for *.example.other files, that outputs (line_number:line)'}}
For a more complex example of a parser, see
jsonextended.complex_parsers
Interface specifications¶
- Parsers:
- file_regex attribute, a str denoting what files to apply it to. A file will be parsed by the longest regex it matches.
- read_file method, which takes an (open) file object and kwargs as parameters
- Decoders:
- dict_signature attribute, a tuple denoting the keys which the dictionary must have, e.g. dict_signature=(‘a’,’b’) decodes {‘a’:1,’b’:2}
- optionally, the attribute allow_other_keys = True can be set, to allow the dictionary to have more keys than the dict_signature and still be decoded, e.g. dict_signature=(‘a’,’b’) would also decode {‘a’:1,’b’:2,’c’: 3}
- from_… method(s), which takes a dict object as parameter. The
plugins.decode
function will use the method denoted by the intype parameter, e.g. if intype=’json’, then from_json will be called.
- Encoders:
- objclass attribute, the object class to apply the encoding to, e.g. objclass=decimal.Decimal encodes objects of that type
- to_… method(s), which takes a dict object as parameter. The
plugins.encode
function will use the method denoted by the outtype parameter, e.g. if outtype=’json’, then to_json will be called.
Extended Examples¶
For more information, all functions contain docstrings with tested examples.
Data Folders JSONisation¶
from jsonextended import ejson, edict, utils
path = utils.get_test_path()
ejson.jkeys(path)
['dir1', 'dir2', 'dir3']
jdict1 = ejson.to_dict(path)
edict.pprint(jdict1,depth=2)
dir1:
dir1_1: {...}
file1: {...}
file2: {...}
dir2:
file1: {...}
dir3:
edict.to_html(jdict1,depth=2)
To try the rendered JSON tree, output in the Jupyter Notebook, go to : https://chrisjsewell.github.io/
Nested Dictionary Manipulation¶
jdict2 = ejson.to_dict(path,['dir1','file1'])
edict.pprint(jdict2,depth=1)
initial: {...}
meta: {...}
optimised: {...}
units: {...}
filtered = edict.filter_keys(jdict2,['vol*'],use_wildcards=True)
edict.pprint(filtered)
initial:
crystallographic:
volume: 924.62752781
primitive:
volume: 462.313764
optimised:
crystallographic:
volume: 1063.98960509
primitive:
volume: 531.994803
edict.pprint(edict.flatten(filtered))
(initial, crystallographic, volume): 924.62752781
(initial, primitive, volume): 462.313764
(optimised, crystallographic, volume): 1063.98960509
(optimised, primitive, volume): 531.994803
Units Schema¶
from jsonextended.units import apply_unitschema, split_quantities
withunits = apply_unitschema(filtered,{'volume':'angstrom^3'})
edict.pprint(withunits)
initial:
crystallographic:
volume: 924.62752781 angstrom ** 3
primitive:
volume: 462.313764 angstrom ** 3
optimised:
crystallographic:
volume: 1063.98960509 angstrom ** 3
primitive:
volume: 531.994803 angstrom ** 3
newunits = apply_unitschema(withunits,{'volume':'nm^3'})
edict.pprint(newunits)
initial:
crystallographic:
volume: 0.92462752781 nanometer ** 3
primitive:
volume: 0.462313764 nanometer ** 3
optimised:
crystallographic:
volume: 1.06398960509 nanometer ** 3
primitive:
volume: 0.531994803 nanometer ** 3
edict.pprint(split_quantities(newunits),depth=4)
initial:
crystallographic:
volume:
magnitude: 0.92462752781
units: nanometer ** 3
primitive:
volume:
magnitude: 0.462313764
units: nanometer ** 3
optimised:
crystallographic:
volume:
magnitude: 1.06398960509
units: nanometer ** 3
primitive:
volume:
magnitude: 0.531994803
units: nanometer ** 3
Summary of Functions¶
JSON Parsing¶
jkeys |
get keys for initial json level, or at level after following key_path |
to_dict |
input json to dict |
JSON/Dict Manipulation¶
LazyLoad |
lazy load a dict_like object or file structure as a pseudo dictionary (works with all edict functions) supplies tab completion of keys |
to_html |
Pretty display dictionary in collapsible format with indents |
apply |
apply a function to all values with a certain leaf (terminal) key |
combine_apply |
combine values with certain leaf (terminal) keys by a function |
combine_lists |
combine lists of dicts |
convert_type |
convert all values of one type to another |
diff |
return the difference between two dict_like objects |
dump |
output dict to json |
extract |
extract section of dictionary |
filter_keyfuncs |
filters leaf nodes key:func(val) pairs of nested dictionary, where func(val) -> True/False |
filter_keys |
filter dict by certain keys |
filter_keyvals |
filters leaf nodes key:value pairs of nested dictionary |
filter_paths |
filter dict by certain paths containing key sets |
filter_values |
filters leaf nodes of nested dictionary |
flatten |
get nested dict as flat {key:val,…}, where key is tuple/string of all nested keys |
flatten2d |
get nested dict as {key:dict,…}, where key is tuple/string of all-1 nested keys |
flattennd |
get nested dict as {key:dict,…}, where key is tuple/string of all-n levels of nested keys |
indexes |
index dictionary by multiple keys |
is_dict_like |
test if object is dict like |
is_iter_non_string |
test if object is a list or tuple |
is_list_of_dict_like |
test if object is a list only containing dict like items |
is_path_like |
test if object is pathlib.Path like |
list_to_dict |
convert a list of dicts to a dict with root keys |
merge |
merge dicts, starting with dicts[1] into dicts[0] |
pprint |
print a nested dict in readable format |
remove_keys |
remove certain keys from nested dict, retaining preceeding paths |
remove_keyvals |
remove paths with at least one branch leading to certain (key,value) pairs from dict |
remove_paths |
remove paths containing certain keys from dict |
rename_keys |
rename keys in dict |
split_key |
split an existing key(s) into multiple levels |
split_lists |
split_lists key:list pairs into dicts for each item in the lists NB: will only split if all split_keys are present |
to_json |
output dict to json |
unflatten |
unflatten dictionary with keys as tuples or delimited strings |
Physical Units¶
apply_unitschema |
apply the unit schema to the data |
combine_quantities |
combine <unit,magnitude> pairs into pint.Quantity objects |
get_in_units |
get a value in the required units |
split_quantities |
split pint.Quantity objects into <unit,magnitude> pairs |
Package API¶
jsonextended package¶
Subpackages¶
jsonextended.encoders package¶
Submodules¶
https://stackoverflow.com/questions/1960516/python-json-serialize-a-decimal-object
-
class
jsonextended.encoders.decimals.
Encode_Decimal
[source]¶ Bases:
object
Examples
>>> from decimal import Decimal >>> Encode_Decimal().to_str(Decimal('1.2345')) '1.2345' >>> Encode_Decimal().to_json(Decimal('1.2345')) {'_python_Decimal_': '1.2345'} >>> Encode_Decimal().from_json({'_python_Decimal_': '1.2345'}) Decimal('1.2345')
-
dict_signature
= ['_python_Decimal_']¶
-
objclass
¶ alias of
decimal.Decimal
-
plugin_descript
= 'encode/decode Decimal type'¶
-
plugin_name
= 'decimal.Decimal'¶
-
https://stackoverflow.com/questions/1960516/python-json-serialize-a-decimal-object
-
class
jsonextended.encoders.fraction.
Encode_Fraction
[source]¶ Bases:
object
Examples
>>> from decimal import Decimal >>> Encode_Fraction().to_str(Fraction(1, 3)) '1/3' >>> Encode_Fraction().to_json(Fraction('1/3')) {'_python_Fraction_': '1/3'} >>> Encode_Fraction().from_json({'_python_Fraction_': '1/3'}) Fraction(1, 3)
-
dict_signature
= ['_python_Fraction_']¶
-
objclass
¶ alias of
fractions.Fraction
-
plugin_descript
= 'encode/decode Fraction type'¶
-
plugin_name
= 'fractions.Fraction'¶
-
https://stackoverflow.com/questions/27909658/json-encoder-and-decoder-for-complex-numpy-arrays
-
class
jsonextended.encoders.ndarray.
Encode_NDArray
[source]¶ Bases:
object
Examples
>>> from pprint import pprint >>> import numpy as np
>>> Encode_NDArray().to_str(np.asarray([1,2,3])) '[1 2 3]'
>>> pprint(Encode_NDArray().to_json(np.asarray([1,2,3]))) {'_numpy_ndarray_': {'dtype': 'int64', 'value': [1, 2, 3]}}
>>> Encode_NDArray().from_json({'_numpy_ndarray_': {'dtype': 'int64', 'value': [1, 2, 3]}}) array([1, 2, 3])
-
dict_signature
= ['_numpy_ndarray_']¶
-
objclass
¶ alias of
numpy.ndarray
-
plugin_descript
= 'encode/decode numpy.ndarray'¶
-
plugin_name
= 'numpy.ndarray'¶
-
-
class
jsonextended.encoders.pint_quantity.
Encode_Quantity
[source]¶ Bases:
object
Examples
>>> from pprint import pprint >>> from pint import UnitRegistry >>> ureg = UnitRegistry()
>>> print(Encode_Quantity().to_str(ureg.Quantity(1,'nanometre'))) 1 nm
>>> pprint(Encode_Quantity().to_json(ureg.Quantity(1,'nanometre'))) {'_pint_Quantity_': {'Magnitude': 1, 'Units': 'nanometer'}}
>>> Encode_Quantity().from_json({'_pint_Quantity_': {'Magnitude': 1, 'Units': 'nanometer'}}) <Quantity(1, 'nanometer')>
-
dict_signature
= ['_pint_Quantity_']¶
-
objclass
¶ alias of
pint.quantity._Quantity
-
plugin_descript
= 'encode/decode pint.Quantity object'¶
-
plugin_name
= 'pint.Quantity'¶
-
-
class
jsonextended.encoders.set.
Encode_Set
[source]¶ Bases:
object
Examples
>>> Encode_Set().to_str(set([1,2,3])) '{1, 2, 3}'
>>> Encode_Set().to_json(set([1,2,3])) {'_python_set_': [1, 2, 3]}
>>> list(Encode_Set().from_json({'_python_set_': [1, 2, 3]})) [1, 2, 3]
-
dict_signature
= ['_python_set_']¶
-
objclass
¶ alias of
builtins.set
-
plugin_descript
= 'decode/encode python set'¶
-
plugin_name
= 'python.set'¶
-
Module contents¶
jsonextended.parsers package¶
Submodules¶
-
class
jsonextended.parsers.csvs.
CSV_Parser
[source]¶ Bases:
object
Examples
>>> from pprint import pprint
>>> from jsonextended.utils import MockPath >>> fileobj = MockPath(is_file=True, ... content='''# comment line ... head1,head2 ... val1,val2 ... val3,val4''' ... ) >>> with fileobj.open() as f: ... data = CSV_Parser().read_file(f) >>> pprint(data) {'head1': ['val1', 'val3'], 'head2': ['val2', 'val4']}
-
file_regex
= '*.csv'¶
-
plugin_descript
= 'read *.csv delimited file with headers to {header:[column_values]}'¶
-
plugin_name
= 'csv.basic'¶
-
-
class
jsonextended.parsers.csvs_literal.
CSVLiteral_Parser
[source]¶ Bases:
object
Examples
>>> from pprint import pprint
>>> from jsonextended.utils import MockPath >>> fileobj = MockPath(is_file=True, ... content='''# comment line ... head1,head2 ... 1.1,3 ... 2.2,"3.3"''' ... ) >>> with fileobj.open() as f: ... data = CSVLiteral_Parser().read_file(f) >>> pprint(data) {'head1': [1.1, 2.2], 'head2': [3, '3.3']}
-
file_regex
= '*.literal.csv'¶
-
plugin_descript
= 'read *.literal.csv delimited files with headers to {header:column_values}, with number strings converted to int/float'¶ , s.t. values are converted to their python type
-
plugin_name
= 'csv.literal'¶
-
-
class
jsonextended.parsers.hdf5.
HDF5_Parser
[source]¶ Bases:
object
Examples
>>> import h5py >>> indata = h5py.File('test.hdf5') >>> dataset = indata.create_dataset("mydataset", (10,), dtype='i') >>> indata.close()
>>> with open('test.hdf5') as f: ... data = HDF5_Parser().read_file(f) >>> data['mydataset'][:] array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
>>> import os >>> os.remove('test.hdf5')
-
file_regex
= '*.hdf5'¶
-
plugin_descript
= 'read *.hdf5 (in read mode) files using h5py'¶
-
plugin_name
= 'hdf5.read'¶
-
-
class
jsonextended.parsers.ipynb.
NBParser
[source]¶ Bases:
object
Examples
>>> from jsonextended.utils import MockPath >>> from jsonextended.edict import pprint >>> fileobj = MockPath(is_file=True, ... content='''{ ... "cells":[], ... "metadata":{} ... }''' ... ) >>> with fileobj.open() as f: ... data = NBParser().read_file(f) >>> pprint(data) cells: [] metadata:
-
file_regex
= '*.ipynb'¶
-
plugin_descript
= 'read Jupyter Notebooks'¶
-
plugin_name
= 'ipynb'¶
-
-
class
jsonextended.parsers.jsons.
JSON_Parser
[source]¶ Bases:
object
Examples
>>> from jsonextended.utils import MockPath >>> fileobj = MockPath(is_file=True, ... content='{"key1":[1,2,3]}' ... ) >>> with fileobj.open() as f: ... data = JSON_Parser().read_file(f) >>> list(data.values()) [[1, 2, 3]]
-
file_regex
= '*.json'¶
-
plugin_descript
= 'read *.json files using json.load'¶
-
plugin_name
= 'json.basic'¶
-
-
class
jsonextended.parsers.keypairs.
KeyPair_Parser
[source]¶ Bases:
object
Examples
>>> from pprint import pprint >>> from jsonextended.utils import MockPath >>> fileobj = MockPath(is_file=True, ... content='''# comment line ... key1 val1 ... key2 val2 ... key3 val3''' ... ) >>> with fileobj.open() as f: ... data = KeyPair_Parser().read_file(f) >>> pprint(data) {'key1': 'val1', 'key2': 'val2', 'key3': 'val3'}
-
file_regex
= '*.keypair'¶
-
plugin_descript
= "read *.keypair, where each line should be; '<key> <pair>'"¶
-
plugin_name
= 'keypair'¶
-
-
class
jsonextended.parsers.yaml.
YAML_Parser
[source]¶ Bases:
object
Examples
>>> from pprint import pprint >>> from jsonextended.utils import MockPath >>> fileobj = MockPath(is_file=True, ... content='key1:\n subkey1: a\n subkey2: [1,2,3]' ... ) >>> with fileobj.open() as f: ... data = YAML_Parser().read_file(f) >>> pprint(dict(data["key1"])) {'subkey1': 'a', 'subkey2': [1, 2, 3]}
-
file_regex
= '*.yaml'¶
-
plugin_descript
= 'read *.yaml files using ruamel.yaml'¶
-
plugin_name
= 'yaml.ruamel'¶
-
Module contents¶
a module to provide data parsers, from the native format to a json representation
jsonextended.units package¶
Submodules¶
-
jsonextended.units.core.
apply_unitschema
(data, uschema, as_quantity=True, raise_outerr=False, convert_base=False, use_wildcards=False, list_of_dicts=False)[source]¶ apply the unit schema to the data
Parameters: - data (dict) –
- uschema (dict) – units schema to apply
- as_quantity (bool) – if true, return values as pint.Quantity objects
- raise_outerr (bool) – raise error if a unit cannot be found in the outschema
- convert_to_base (bool) – rescale units to base units
- use_wildcards (bool) – if true, can use * (matches everything) and ? (matches any single character)
- list_of_dicts (bool) – treat list of dicts as additional branches
Examples
>>> from pprint import pprint
>>> data = {'energy':1,'x':[1,2],'other':{'y':[4,5]},'y':[4,5],'meta':None} >>> uschema = {'energy':'eV','x':'nm','other':{'y':'m'},'y':'cm'} >>> data_units = apply_unitschema(data,uschema) >>> pprint(data_units) {'energy': <Quantity(1, 'electron_volt')>, 'meta': None, 'other': {'y': <Quantity([4 5], 'meter')>}, 'x': <Quantity([1 2], 'nanometer')>, 'y': <Quantity([4 5], 'centimeter')>}
>>> newschema = {'energy':'kJ','other':{'y':'nm'},'y':'m'} >>> new_data = apply_unitschema(data_units,newschema) >>> str(new_data["energy"]) '1.60217653e-22 kilojoule' >>> new_data["other"]["y"].magnitude.round(3).tolist(), str(new_data["other"]["y"].units) ([4000000000.0, 5000000000.0], 'nanometer')
>>> old_data = apply_unitschema(new_data,uschema,as_quantity=False) >>> old_data["energy"] 1.0 >>> old_data["other"]["y"].round(3).tolist() [4.0, 5.0]
-
jsonextended.units.core.
combine_quantities
(data, units='units', magnitude='magnitude', list_of_dicts=False)[source]¶ combine <unit,magnitude> pairs into pint.Quantity objects
Parameters: Examples
>>> from pprint import pprint
>>> sdata = {'energy': {'magnitude': 1.602e-22, 'units': 'kilojoule'}, ... 'meta': None, ... 'other': {'y': {'magnitude': [4, 5, 6], 'units': 'nanometer'}}, ... 'x': {'magnitude': [1, 2, 3], 'units': 'nanometer'}, ... 'y': {'magnitude': [8,9,10], 'units': 'meter'}} ... >>> combined_data = combine_quantities(sdata) >>> pprint(combined_data) {'energy': <Quantity(1.602e-22, 'kilojoule')>, 'meta': None, 'other': {'y': <Quantity([4 5 6], 'nanometer')>}, 'x': <Quantity([1 2 3], 'nanometer')>, 'y': <Quantity([ 8 9 10], 'meter')>}
-
jsonextended.units.core.
split_quantities
(data, units='units', magnitude='magnitude', list_of_dicts=False)[source]¶ split pint.Quantity objects into <unit,magnitude> pairs
Parameters: Examples
>>> from pprint import pprint
>>> from pint import UnitRegistry >>> ureg = UnitRegistry() >>> Q = ureg.Quantity
>>> qdata = {'energy': Q(1.602e-22, 'kilojoule'), ... 'meta': None, ... 'other': {'y': Q([4,5,6], 'nanometer')}, ... 'x': Q([1,2,3], 'nanometer'), ... 'y': Q([8,9,10], 'meter')} ... >>> split_data = split_quantities(qdata) >>> pprint(split_data) {'energy': {'magnitude': 1.602e-22, 'units': 'kilojoule'}, 'meta': None, 'other': {'y': {'magnitude': array([4, 5, 6]), 'units': 'nanometer'}}, 'x': {'magnitude': array([1, 2, 3]), 'units': 'nanometer'}, 'y': {'magnitude': array([ 8, 9, 10]), 'units': 'meter'}}
Module contents¶
a package to manipulate the physical units of JSON data, via unitschema
jsonextended utilises the pint package to manage physical units. This was chosen due to its large user-base and support for numpy: https://socialcompare.com/en/comparison/python-units-quantities-packages-383avix4
Submodules¶
jsonextended.edict module¶
a module to manipulate python dictionary like objects
-
class
jsonextended.edict.
LazyLoad
(obj, ignore_regexes=('.*', '_*'), recursive=True, parent=None, key_paths=True, list_of_dicts=False, parse_errors=True, **parser_kwargs)[source]¶ Bases:
object
lazy load a dict_like object or file structure as a pseudo dictionary (works with all edict functions) supplies tab completion of keys
Parameters: - obj (dict or str or object) – file like object or path to file
- ignore_regexes (list[str]) – ignore files and folders matching these regexes (can contain *, ? and [] wildcards)
- recursive (bool) – if True, load subdirectories
- parent (object) – the parent object of this instance
- key_paths (bool) – indicates if the keys of the object can be resolved as file/folder paths (to ensure strings do not get unintentionally treated as paths)
- list_of_dicts (bool) – treat list of dicts as additional branches
- parse_errors (bool) – if True, if parsing a file fails then an IOError will be raised if False, if parsing a file fails then only a logging.error will be made and the value will be returned as None
- parser_kwargs (dict) – additional keywords for parser plugins read_file method, (loaded decoder plugins are parsed by default)
Examples
>>> from jsonextended import plugins >>> plugins.load_builtin_plugins() []
>>> l = LazyLoad({'a':{'b':2},3:4}) >>> print(l) {3:..,a:..} >>> l['a'] {b:..} >>> l[['a','b']] 2 >>> l.a.b 2 >>> l.i3 4
>>> from jsonextended.utils import get_test_path >>> from jsonextended.edict import pprint
>>> lazydict = LazyLoad(get_test_path()) >>> pprint(lazydict,depth=2) dir1: dir1_1: {...} file1.json: {...} file2.json: {...} dir2: file1.csv: {...} file1.json: {...} dir3: file1.keypair: key1: val1 key2: val2 key3: val3
>>> 'dir1' in lazydict True
>>> sorted(lazydict.keys()) ['dir1', 'dir2', 'dir3', 'file1.keypair']
>>> sorted(lazydict.values()) [{}, {key1:..,key2:..,key3:..}, {file1.csv:..,file1.json:..}, {dir1_1:..,file1.json:..,file2.json:..}]
>>> lazydict.dir1.file1_json {initial:..,meta:..,optimised:..,units:..}
>>> ldict = lazydict.dir1.file1_json.to_dict() >>> isinstance(ldict,dict) True >>> pprint(ldict,depth=1) initial: {...} meta: {...} optimised: {...} units: {...}
>>> lazydict = LazyLoad(get_test_path(),recursive=False) >>> lazydict {file1.keypair:..}
>>> lazydict = LazyLoad([{'a':{'b':{'c':1}}},{'a':2}], ... list_of_dicts=True) >>> lazydict.i0.a.b.c 1
>>> LazyLoad([1,2,3]) Traceback (most recent call last): ... ValueError: not an expandable object: [1, 2, 3]
>>> plugins.unload_all_plugins()
-
jsonextended.edict.
apply
(d, leaf_key, func, new_name=None, remove_lkey=True, list_of_dicts=False, unflatten_level=0, deepcopy=True, **kwargs)[source]¶ apply a function to all values with a certain leaf (terminal) key
Parameters: - d (dict) –
- leaf_key (str) – name of leaf key
- func (callable) – function to apply
- new_name (str) – if not None, rename leaf_key
- remove_lkey (bool) – whether to remove original leaf_key (if new_name is not None)
- list_of_dicts (bool) – treat list of dicts as additional branches
- unflatten_level (int or None) – the number of levels to leave unflattened before combining, for instance if you need dicts as inputs
- deepcopy (bool) – deepcopy values
- kwargs (dict) – additional keywords to parse to function
Examples
>>> from pprint import pprint >>> d = {'a':1,'b':1} >>> func = lambda x: x+1 >>> pprint(apply(d,'a',func)) {'a': 2, 'b': 1} >>> pprint(apply(d,'a',func,new_name='c')) {'b': 1, 'c': 2} >>> pprint(apply(d,'a',func,new_name='c', remove_lkey=False)) {'a': 1, 'b': 1, 'c': 2}
>>> test_dict = {"a":[{"b":[{"c":1, "d": 2}, {"e":3, "f": 4}]}, {"b":[{"c":5, "d": 6}, {"e":7, "f": 8}]}]} >>> pprint(apply(test_dict, "b", lambda x: x[-1], list_of_dicts=True, unflatten_level=2)) {'a': [{'b': {'e': 3, 'f': 4}}, {'b': {'e': 7, 'f': 8}}]}
-
jsonextended.edict.
combine_apply
(d, leaf_keys, func, new_name, unflatten_level=1, remove_lkeys=True, overwrite=False, list_of_dicts=False, deepcopy=True, **kwargs)[source]¶ combine values with certain leaf (terminal) keys by a function
Parameters: - d (dict) –
- leaf_keys (list) – names of leaf keys
- func (callable) – function to apply, must take at least len(leaf_keys) arguments
- new_name (str) – new key name
- unflatten_level (int or None) – the number of levels to leave unflattened before combining, for instance if you need dicts as inputs (None means all)
- remove_lkeys (bool) – whether to remove original leaf_keys
- overwrite (bool) – whether to overwrite any existing new_name key
- list_of_dicts (bool) – treat list of dicts as additional branches
- deepcopy (bool) – deepcopy values
- kwargs (dict) – additional keywords to parse to function
Examples
>>> from pprint import pprint >>> d = {'a':1,'b':2} >>> func = lambda x,y: x+y >>> pprint(combine_apply(d,['a','b'],func,'c')) {'c': 3} >>> pprint(combine_apply(d,['a','b'],func,'c',remove_lkeys=False)) {'a': 1, 'b': 2, 'c': 3}
>>> d = {1:{'a':1,'b':2},2:{'a':4,'b':5},3:{'a':1}} >>> pprint(combine_apply(d,['a','b'],func,'c')) {1: {'c': 3}, 2: {'c': 9}, 3: {'a': 1}}
>>> func2 = lambda x: sorted(list(x.keys())) >>> d2 = {'d':{'a':{'b':1,'c':2}}} >>> pprint(combine_apply(d2,['a'],func2,'a',unflatten_level=2)) {'d': {'a': ['b', 'c']}}
-
jsonextended.edict.
combine_lists
(d, keys=None, deepcopy=True)[source]¶ combine lists of dicts
Parameters: Example
>>> from pprint import pprint >>> d = {'path_key': {'a': 1, 'split': [{'x': 1, 'y': 3}, {'x': 2, 'y': 4}]}} >>> pprint(combine_lists(d,['split'])) {'path_key': {'a': 1, 'split': {'x': [1, 2], 'y': [3, 4]}}}
>>> combine_lists([{"a":2}, {"a":1}]) {'a': [2, 1]}
-
jsonextended.edict.
convert_type
(d, intype, outtype, convert_list=True, in_place=True)[source]¶ convert all values of one type to another
Parameters: Examples
>>> from pprint import pprint
>>> d = {'a':'1','b':'2'} >>> pprint(convert_type(d,str,float)) {'a': 1.0, 'b': 2.0}
>>> d = {'a':['1','2']} >>> pprint(convert_type(d,str,float)) {'a': [1.0, 2.0]}
>>> d = {'a':[('1','2'),[3,4]]} >>> pprint(convert_type(d,str,float)) {'a': [(1.0, 2.0), [3, 4]]}
-
jsonextended.edict.
diff
(new_dict, old_dict, iter_prefix='__iter__', np_allclose=False, **kwargs)[source]¶ return the difference between two dict_like objects
Parameters: Returns: outcome – Containing none or more of:
- ”insertions” : list of (path, val)
- ”deletions” : list of (path, val)
- ”changes” : list of (path, (val1, val2))
- ”uncomparable” : list of (path, (val1, val2))
Return type: Examples
>>> from pprint import pprint
>>> diff({'a':1},{'a':1}) {}
>>> pprint(diff({'a': 1, 'b': 2, 'c': 5},{'b': 3, 'c': 4, 'd': 6})) {'changes': [(('b',), (2, 3)), (('c',), (5, 4))], 'deletions': [(('d',), 6)], 'insertions': [(('a',), 1)]}
>>> pprint(diff({'a': [{"b":1}, {"c":2}, 1]},{'a': [{"b":1}, {"d":2}, 2]})) {'changes': [(('a', '__iter__2'), (1, 2))], 'deletions': [(('a', '__iter__1', 'd'), 2)], 'insertions': [(('a', '__iter__1', 'c'), 2)]}
>>> diff({'a':1}, {'a':1+1e-10}) {'changes': [(('a',), (1, 1.0000000001))]}
>>> diff({'a':1}, {'a':1+1e-10}, np_allclose=True) {}
-
jsonextended.edict.
dump
(dct, jfile, overwrite=False, dirlevel=0, sort_keys=True, indent=2, default_name='root.json', **kwargs)[source]¶ output dict to json
Parameters: - dct (dict) –
- jfile (str or file_like) – if file_like, must have write method
- overwrite (bool) – whether to overwrite existing files
- dirlevel (int) – if jfile is path to folder, defines how many key levels to set as sub-folders
- sort_keys (bool) – if true then the output of dictionaries will be sorted by key
- indent (int) – if non-negative integer, then JSON array elements and object members will be pretty-printed on new lines with that indent level spacing.
- kwargs (dict) – keywords for json.dump
-
jsonextended.edict.
extract
(d, path=None)[source]¶ extract section of dictionary
Parameters: Returns: - new_dict (dict) – original, without extracted section
- extract_dict (dict) – extracted section
Examples
>>> from pprint import pprint >>> d = {1:{"a":"A"},2:{"b":"B",'c':'C'}} >>> pprint(extract(d,[2,'b'])) ({1: {'a': 'A'}, 2: {'c': 'C'}}, {'b': 'B'})
-
jsonextended.edict.
filter_keyfuncs
(d, keyfuncs, logic='OR', keep_siblings=False, list_of_dicts=False, deepcopy=True)[source]¶ filters leaf nodes key:func(val) pairs of nested dictionary, where func(val) -> True/False
Parameters: Examples
>>> from pprint import pprint
>>> d = {'a':{'b':1,'c':2,'d':3}, 'e':4} >>> func1 = lambda v: v <= 2
>>> pprint(filter_keyfuncs(d, {'b': func1, 'e': func1}, logic="OR", keep_siblings=False)) {'a': {'b': 1}}
>>> pprint(filter_keyfuncs(d,[('b',func1), ('d', func1)], logic="OR", keep_siblings=True)) {'a': {'b': 1, 'c': 2, 'd': 3}}
>>> pprint(filter_keyfuncs(d, {'b': func1, 'e': func1}, logic="AND", keep_siblings=False)) {}
>>> pprint(filter_keyfuncs(d, {'b': func1, 'd': func1}, logic="AND", keep_siblings=False)) {}
>>> pprint(filter_keyfuncs(d, {'b': func1, 'c': func1}, logic="AND", keep_siblings=False)) {'a': {'b': 1, 'c': 2}}
>>> pprint(filter_keyfuncs(d,[('b',func1), ('c',func1)], logic="AND", keep_siblings=True)) {'a': {'b': 1, 'c': 2, 'd': 3}}
-
jsonextended.edict.
filter_keys
(d, keys, use_wildcards=False, list_of_dicts=False, deepcopy=True)[source]¶ filter dict by certain keys
Parameters: Examples
>>> from pprint import pprint
>>> d = {1:{"a":"A"},2:{"b":"B"},4:{5:{6:'a',7:'b'}}} >>> pprint(filter_keys(d,['a',6])) {1: {'a': 'A'}, 4: {5: {6: 'a'}}}
>>> d = {1:{"axxxx":"A"},2:{"b":"B"}} >>> pprint(filter_keys(d,['a*'],use_wildcards=True)) {1: {'axxxx': 'A'}}
-
jsonextended.edict.
filter_keyvals
(d, keyvals, logic='OR', keep_siblings=False, list_of_dicts=False, deepcopy=True)[source]¶ filters leaf nodes key:value pairs of nested dictionary
Parameters: Examples
>>> from pprint import pprint
>>> d = {1:{6:'a'},3:{7:'a'},2:{6:"b"},4:{5:{6:'a'}}} >>> pprint(filter_keyvals(d,[(6,'a')])) {1: {6: 'a'}, 4: {5: {6: 'a'}}}
>>> d2 = {'a':{'b':1,'c':2,'d':3}, 'e':4}
>>> pprint(filter_keyvals(d2, {'b': 1, 'e': 4}, logic="OR", keep_siblings=False)) {'a': {'b': 1}, 'e': 4}
>>> pprint(filter_keyvals(d2,[('b',1)], logic="OR", keep_siblings=True)) {'a': {'b': 1, 'c': 2, 'd': 3}}
>>> pprint(filter_keyvals(d2, {'b': 1, 'e': 4}, logic="AND", keep_siblings=False)) {}
>>> pprint(filter_keyvals(d2, {'b': 1, 'c': 2}, logic="AND", keep_siblings=False)) {'a': {'b': 1, 'c': 2}}
>>> pprint(filter_keyvals(d2,[('b',1), ('c',2)], logic="AND", keep_siblings=True)) {'a': {'b': 1, 'c': 2, 'd': 3}}
>>> d3 = {"a": {"b": 1, "f": {"d": 3}}, "e": {"b": 1, "c": 2, "f": {"d": 3}}, "g": 5} >>> pprint(filter_keyvals(d3,[('b',1), ('c', 2)], logic="OR", keep_siblings=True)) {'a': {'b': 1, 'f': {'d': 3}}, 'e': {'b': 1, 'c': 2, 'f': {'d': 3}}}
>>> pprint(filter_keyvals(d3,[('b',1), ('c', 2)], logic="AND", keep_siblings=True)) {'e': {'b': 1, 'c': 2, 'f': {'d': 3}}}
-
jsonextended.edict.
filter_paths
(d, paths, list_of_dicts=False, deepcopy=True)[source]¶ filter dict by certain paths containing key sets
Parameters: Examples
>>> from pprint import pprint >>> d = {'a':{'b':1,'c':{'d':2}},'e':{'c':3}} >>> filter_paths(d,[('c','d')]) {'a': {'c': {'d': 2}}}
>>> d2 = {'a':[{'b':1,'c':3},{'b':1,'c':2}]} >>> pprint(filter_paths(d2,["b"],list_of_dicts=False)) {}
>>> pprint(filter_paths(d2,["c"],list_of_dicts=True)) {'a': [{'c': 3}, {'c': 2}]}
-
jsonextended.edict.
filter_values
(d, vals=None, list_of_dicts=False, deepcopy=True)[source]¶ filters leaf nodes of nested dictionary
Parameters: Examples
>>> d = {1:{"a":"A"},2:{"b":"B"},4:{5:{6:'a'}}} >>> filter_values(d,['a']) {4: {5: {6: 'a'}}}
-
jsonextended.edict.
flatten
(d, key_as_tuple=True, sep='.', list_of_dicts=None, all_iters=None)[source]¶ get nested dict as flat {key:val,…}, where key is tuple/string of all nested keys
Parameters: - d (object) –
- key_as_tuple (bool) – whether keys are list of nested keys or delimited string of nested keys
- sep (str) – if key_as_tuple=False, delimiter for keys
- list_of_dicts (str or None) – if not None, flatten lists of dicts using this prefix
- all_iters (str or None) – if not None, flatten all lists and tuples using this prefix
Examples
>>> from pprint import pprint
>>> d = {1:{"a":"A"}, 2:{"b":"B"}} >>> pprint(flatten(d)) {(1, 'a'): 'A', (2, 'b'): 'B'}
>>> d = {1:{"a":"A"},2:{"b":"B"}} >>> pprint(flatten(d,key_as_tuple=False)) {'1.a': 'A', '2.b': 'B'}
>>> d = [{'a':1},{'b':[1, 2]}] >>> pprint(flatten(d,list_of_dicts='__list__')) {('__list__0', 'a'): 1, ('__list__1', 'b'): [1, 2]}
>>> d = [{'a':1},{'b':[1, 2]}] >>> pprint(flatten(d,all_iters='__iter__')) {('__iter__0', 'a'): 1, ('__iter__1', 'b', '__iter__0'): 1, ('__iter__1', 'b', '__iter__1'): 2}
-
jsonextended.edict.
flatten2d
(d, key_as_tuple=True, delim='.', list_of_dicts=None)[source]¶ get nested dict as {key:dict,…}, where key is tuple/string of all-1 nested keys
NB: is same as flattennd(d,1,key_as_tuple,delim)
Parameters: Examples
>>> from pprint import pprint
>>> d = {1:{2:{3:{'b':'B','c':'C'},4:'D'}}} >>> pprint(flatten2d(d)) {(1, 2): {4: 'D'}, (1, 2, 3): {'b': 'B', 'c': 'C'}}
>>> pprint(flatten2d(d,key_as_tuple=False,delim=',')) {'1,2': {4: 'D'}, '1,2,3': {'b': 'B', 'c': 'C'}}
-
jsonextended.edict.
flattennd
(d, levels=0, key_as_tuple=True, delim='.', list_of_dicts=None)[source]¶ get nested dict as {key:dict,…}, where key is tuple/string of all-n levels of nested keys
Parameters: - d (dict) –
- levels (int) – the number of levels to leave unflattened
- key_as_tuple (bool) – whether keys are list of nested keys or delimited string of nested keys
- delim (str) – if key_as_tuple=False, delimiter for keys
- list_of_dicts (str or None) – if not None, flatten lists of dicts using this prefix
Examples
>>> from pprint import pprint
>>> d = {1:{2:{3:{'b':'B','c':'C'},4:'D'}}} >>> pprint(flattennd(d,0)) {(1, 2, 3, 'b'): 'B', (1, 2, 3, 'c'): 'C', (1, 2, 4): 'D'}
>>> pprint(flattennd(d,1)) {(1, 2): {4: 'D'}, (1, 2, 3): {'b': 'B', 'c': 'C'}}
>>> pprint(flattennd(d,2)) {(1,): {2: {4: 'D'}}, (1, 2): {3: {'b': 'B', 'c': 'C'}}}
>>> pprint(flattennd(d,3)) {(): {1: {2: {4: 'D'}}}, (1,): {2: {3: {'b': 'B', 'c': 'C'}}}}
>>> pprint(flattennd(d,4)) {(): {1: {2: {3: {'b': 'B', 'c': 'C'}, 4: 'D'}}}}
>>> pprint(flattennd(d,5)) {(): {1: {2: {3: {'b': 'B', 'c': 'C'}, 4: 'D'}}}}
>>> pprint(flattennd(d,1,key_as_tuple=False,delim='.')) {'1.2': {4: 'D'}, '1.2.3': {'b': 'B', 'c': 'C'}}
>>> test_dict = {"a":[{"b":[{"c":1, "d": 2}, {"e":3, "f": 4}]}, {"b":[{"c":5, "d": 6}, {"e":7, "f": 8}]}]} >>> pprint(flattennd(test_dict, list_of_dicts="__list__", levels=2)) {('a', '__list__0', 'b'): [{'c': 1, 'd': 2}, {'e': 3, 'f': 4}], ('a', '__list__1', 'b'): [{'c': 5, 'd': 6}, {'e': 7, 'f': 8}]}
>>> pprint(flattennd(test_dict, list_of_dicts="__list__", levels=3)) {('a', '__list__0'): {'b': [{'c': 1, 'd': 2}, {'e': 3, 'f': 4}]}, ('a', '__list__1'): {'b': [{'c': 5, 'd': 6}, {'e': 7, 'f': 8}]}}
-
jsonextended.edict.
indexes
(dic, keys=None)[source]¶ index dictionary by multiple keys
Parameters: Examples
>>> d = {1:{"a":"A"},2:{"b":"B"}} >>> indexes(d,[1,'a']) 'A'
-
jsonextended.edict.
is_list_of_dict_like
(obj, attr=('keys', 'items'))[source]¶ test if object is a list only containing dict like items
-
jsonextended.edict.
is_path_like
(obj, attr=('name', 'is_file', 'is_dir', 'iterdir'))[source]¶ test if object is pathlib.Path like
-
jsonextended.edict.
list_to_dict
(lst, key=None, remove_key=True)[source]¶ convert a list of dicts to a dict with root keys
Parameters: Examples
>>> from pprint import pprint >>> lst = [{'name':'f','b':1},{'name':'g','c':2}] >>> pprint(list_to_dict(lst)) {'0': {'b': 1, 'name': 'f'}, '1': {'c': 2, 'name': 'g'}}
>>> pprint(list_to_dict(lst,'name')) {'f': {'b': 1}, 'g': {'c': 2}}
-
jsonextended.edict.
merge
(dicts, overwrite=False, append=False, list_of_dicts=False)[source]¶ merge dicts, starting with dicts[1] into dicts[0]
Parameters: Examples
>>> from pprint import pprint
>>> d1 = {1:{"a":"A"},2:{"b":"B"}} >>> d2 = {1:{"a":"A"},2:{"c":"C"}} >>> pprint(merge([d1,d2])) {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}}
>>> d1 = {1:{"a":["A"]}} >>> d2 = {1:{"a":["D"]}} >>> pprint(merge([d1,d2],append=True)) {1: {'a': ['A', 'D']}}
>>> d1 = {1:{"a":"A"},2:{"b":"B"}} >>> d2 = {1:{"a":"X"},2:{"c":"C"}} >>> merge([d1,d2],overwrite=False) Traceback (most recent call last): ... ValueError: different data already exists at "1.a": old: A, new: X
>>> merge([{},{}],overwrite=False) {} >>> merge([{},{'a':1}],overwrite=False) {'a': 1} >>> pprint(merge([{},{'a':1},{'a':1},{'b':2}])) {'a': 1, 'b': 2} >>> pprint(merge([{'a':[{"b": 1}, {"c": 2}]}, {'a':[{"d": 3}]}])) Traceback (most recent call last): ... ValueError: different data already exists at "a": old: [{'b': 1}, {'c': 2}], new: [{'d': 3}] >>> pprint(merge([{'a':[{"b": 1}, {"c": 2}]}, {'a':[{"d": 3}]}], list_of_dicts=True)) Traceback (most recent call last): ... ValueError: list of dicts are of different lengths at "a": old: [{'b': 1}, {'c': 2}], new: [{'d': 3}] >>> pprint(merge([{'a':[{"b": 1}, {"c": 2}]}, {'a':[{"d": 3}, {"e": 4}]}], list_of_dicts=True)) {'a': [{'b': 1, 'd': 3}, {'c': 2, 'e': 4}]}
-
jsonextended.edict.
pprint
(d, lvlindent=2, initindent=0, delim=':', max_width=80, depth=3, no_values=False, align_vals=True, print_func=None, keycolor=None, compress_lists=None, round_floats=None, _dlist=False)[source]¶ - print a nested dict in readable format
- (- denotes an element in a list of dictionaries)
Parameters: - d (object) –
- lvlindent (int) – additional indentation spaces for each level
- initindent (int) – initial indentation spaces
- delim (str) – delimiter between key and value nodes
- max_width (int) – max character width of each line
- depth (int or None) – maximum levels to display
- no_values (bool) – whether to print values
- align_vals (bool) – whether to align values for each level
- print_func (callable or None) – function to print strings (print if None)
- keycolor (None or str) – if str, color keys by this color, allowed: red, green, yellow, blue, magenta, cyan, white
- compress_lists (int) –
- compress lists/tuples longer than this,
- e.g. [1,1,1,1,1,1] -> [1, 1,…, 1]
- round_floats (int) – significant figures for floats
Examples
>>> d = {'a':{'b':{'c':'Å','de':[4,5,[7,'x'],9]}}} >>> pprint(d,depth=None) a: b: c: Å de: [4, 5, [7, x], 9] >>> pprint(d,max_width=17,depth=None) a: b: c: Å de: [4, 5, [7, x], 9] >>> pprint(d,no_values=True,depth=None) a: b: c: de: >>> pprint(d,depth=2) a: b: {...} >>> pprint({'a':[1,1,1,1,1,1,1,1]}, ... compress_lists=3) a: [1, 1, 1, ...(x5)]
-
jsonextended.edict.
remove_keys
(d, keys=None, use_wildcards=True, list_of_dicts=False, deepcopy=True)[source]¶ remove certain keys from nested dict, retaining preceeding paths
Parameters: Examples
>>> from pprint import pprint >>> d = {1:{"a":"A"},"a":{"b":"B"}} >>> pprint(remove_keys(d,['a'])) {1: 'A', 'b': 'B'}
>>> pprint(remove_keys({'abc':1},['a*'],use_wildcards=False)) {'abc': 1} >>> pprint(remove_keys({'abc':1},['a*'],use_wildcards=True)) {}
-
jsonextended.edict.
remove_keyvals
(d, keyvals=None, list_of_dicts=False, deepcopy=True)[source]¶ remove paths with at least one branch leading to certain (key,value) pairs from dict
Parameters: Examples
>>> from pprint import pprint >>> d = {1:{"b":"A"},"a":{"b":"B","c":"D"},"b":{"a":"B"}} >>> pprint(remove_keyvals(d,[("b","B")])) {1: {'b': 'A'}, 'b': {'a': 'B'}}
>>> d2 = {'a':[{'b':1,'c':1},{'b':1,'c':2}]} >>> pprint(remove_keyvals(d2,[("b",1)])) {'a': [{'b': 1, 'c': 1}, {'b': 1, 'c': 2}]}
>>> pprint(remove_keyvals(d2,[("b",1)],list_of_dicts=True)) {}
-
jsonextended.edict.
remove_paths
(d, keys, list_of_dicts=False, deepcopy=True)[source]¶ remove paths containing certain keys from dict
Parameters: Examples
>>> from pprint import pprint >>> d = {1:{"a":"A"},2:{"b":"B"},4:{5:{6:'a',7:'b'}}} >>> pprint(remove_paths(d,[6,'a'])) {2: {'b': 'B'}, 4: {5: {7: 'b'}}}
>>> d = {1:{2: 3}, 1:{4: 5}} >>> pprint(remove_paths(d,[(1, 2)])) {1: {4: 5}}
>>> d2 = {'a':[{'b':1,'c':{'b':3}},{'b':1,'c':2}]} >>> pprint(remove_paths(d2,["b"],list_of_dicts=False)) {'a': [{'b': 1, 'c': {'b': 3}}, {'b': 1, 'c': 2}]}
>>> pprint(remove_paths(d2,["b"],list_of_dicts=True)) {'a': [{'c': 2}]}
-
jsonextended.edict.
rename_keys
(d, keymap=None, list_of_dicts=False, deepcopy=True)[source]¶ rename keys in dict
Parameters: Examples
>>> from pprint import pprint >>> d = {'a':{'old_name':1}} >>> pprint(rename_keys(d,{'old_name':'new_name'})) {'a': {'new_name': 1}}
-
jsonextended.edict.
split_key
(d, key, new_keys, before=True, list_of_dicts=False, deepcopy=True)[source]¶ split an existing key(s) into multiple levels
Parameters: Examples
>>> from pprint import pprint >>> d = {'a':1,'b':2} >>> pprint(split_key(d,'a',['c','d'])) {'b': 2, 'c': {'d': {'a': 1}}}
>>> pprint(split_key(d,'a',['c','d'],before=False)) {'a': {'c': {'d': 1}}, 'b': 2}
>>> d2 = [{'a':1},{'a':2},{'a':3}] >>> pprint(split_key(d2,'a',['b'],list_of_dicts=True)) [{'b': {'a': 1}}, {'b': {'a': 2}}, {'b': {'a': 3}}]
-
jsonextended.edict.
split_lists
(d, split_keys, new_name='split', check_length=True, deepcopy=True)[source]¶ split_lists key:list pairs into dicts for each item in the lists NB: will only split if all split_keys are present
Parameters: Examples
>>> from pprint import pprint
>>> d = {'path_key':{'x':[1,2],'y':[3,4],'a':1}} >>> new_d = split_lists(d,['x','y']) >>> pprint(new_d) {'path_key': {'a': 1, 'split': [{'x': 1, 'y': 3}, {'x': 2, 'y': 4}]}}
>>> split_lists(d,['x','a']) Traceback (most recent call last): ... ValueError: "a" data at the following path is not a list ('path_key',)
>>> d2 = {'path_key':{'x':[1,7],'y':[3,4,5]}} >>> split_lists(d2,['x','y']) Traceback (most recent call last): ... ValueError: lists at the following path do not have the same size ('path_key',)
-
class
jsonextended.edict.
to_html
(obj, depth=2, max_length=20, max_height=600, sort=True, local=True, uniqueid=None)[source]¶ Bases:
object
Pretty display dictionary in collapsible format with indents
Parameters: - obj (str or dict) – dict or json
- depth (int) – Depth of the json tree structure displayed, the rest is collapsed.
- max_length (int) – Maximum number of characters of a string displayed as preview, longer string appear collapsed.
- max_height (int) – Maxium height in pixels of containing box.
- sort (bool) – Whether the json keys are sorted alphabetically.
- local (bool) – use local version of javascript file
- uniqueid (str) – unique identifier (if None, auto-created)
Examples
>>> dic = {'sape': {'value': 22}, 'jack': 4098, 'guido': 4127} >>> obj = to_html(dic, depth=1, max_length=10, sort=False, local=True, uniqueid='123') >>> print(obj._repr_html_()) <style> .renderjson a { text-decoration: none; } .renderjson .disclosure { color: red; font-size: 125%; } .renderjson .syntax { color: darkgrey; } .renderjson .string { color: black; } .renderjson .number { color: black; } .renderjson .boolean { color: purple; } .renderjson .key { color: royalblue; } .renderjson .keyword { color: orange; } .renderjson .object.syntax { color: lightseagreen; } .renderjson .array.syntax { color: lightseagreen; } </style><div id="123" style="max-height: 600px; width:100%%;"></div> <script> require(["jsonextended/renderjson.js"], function() { document.getElementById("123").appendChild( renderjson.set_max_string_length(10) //.set_icons(circled plus, circled minus) .set_icons(String.fromCharCode(8853), String.fromCharCode(8854)) .set_sort_objects(false) .set_show_to_level(1)({"guido": 4127, "jack": 4098, "sape": {"value": 22}})) });</script>
-
jsonextended.edict.
to_json
(dct, jfile, overwrite=False, dirlevel=0, sort_keys=True, indent=2, default_name='root.json', **kwargs)[source]¶ output dict to json
Parameters: - dct (dict) –
- jfile (str or file_like) – if file_like, must have write method
- overwrite (bool) – whether to overwrite existing files
- dirlevel (int) – if jfile is path to folder, defines how many key levels to set as sub-folders
- sort_keys (bool) – if true then the output of dictionaries will be sorted by key
- indent (int) – if non-negative integer, then JSON array elements and object members will be pretty-printed on new lines with that indent level spacing.
- kwargs (dict) – keywords for json.dump
Examples
>>> from jsonextended.utils import MockPath >>> file_obj = MockPath('test.json',is_file=True,exists=False) >>> dct = {'a':{'b':1}} >>> to_json(dct, file_obj) >>> print(file_obj.to_string()) File("test.json") Contents: { "a": { "b": 1 } }
>>> from jsonextended.utils import MockPath >>> folder_obj = MockPath() >>> dct = {'x':{'a':{'b':1},'c':{'d':3}}} >>> to_json(dct, folder_obj, dirlevel=0,indent=None) >>> print(folder_obj.to_string(file_content=True)) Folder("root") File("x.json") Contents: {"a": {"b": 1}, "c": {"d": 3}}
>>> folder_obj = MockPath() >>> to_json(dct, folder_obj, dirlevel=1,indent=None) >>> print(folder_obj.to_string(file_content=True)) Folder("root") Folder("x") File("a.json") Contents: {"b": 1} File("c.json") Contents: {"d": 3}
-
jsonextended.edict.
unflatten
(d, key_as_tuple=True, delim='.', list_of_dicts=None, deepcopy=True)[source]¶ unflatten dictionary with keys as tuples or delimited strings
Parameters: Examples
>>> from pprint import pprint
>>> d = {('a','b'):1,('a','c'):2} >>> pprint(unflatten(d)) {'a': {'b': 1, 'c': 2}}
>>> d2 = {'a.b':1,'a.c':2} >>> pprint(unflatten(d2,key_as_tuple=False)) {'a': {'b': 1, 'c': 2}}
>>> d3 = {('a','__list__1', 'a'): 1, ('a','__list__0', 'b'): 2} >>> pprint(unflatten(d3,list_of_dicts='__list__')) {'a': [{'b': 2}, {'a': 1}]}
>>> unflatten({('a','b','c'):1,('a','b'):2}) Traceback (most recent call last): ... KeyError: "child conflict for path: ('a', 'b'); 2 and {'c': 1}"
jsonextended.ejson module¶
-
jsonextended.ejson.
jkeys
(jfile, key_path=None, in_memory=True, ignore_prefix=('.', '_'))[source]¶ get keys for initial json level, or at level after following key_path
Parameters: - jfile (str, file_like or path_like) – if str, must be existing file or folder, if file_like, must have ‘read’ method if path_like, must have ‘iterdir’ method (see pathlib.Path)
- key_path (list[str]) – a list of keys to index into the json before returning keys
- in_memory (bool) – if true reads json into memory before finding keys (this is faster but uses more memory)
- ignore_prefix (list[str]) – ignore folders beginning with these prefixes
Examples
>>> from jsonextended.utils import MockPath >>> file_obj = MockPath('test.json',is_file=True, ... content=''' ... { ... "a": 1, ... "b": [1.1,2.1], ... "c": {"d":"e","f":"g"} ... } ... ''') ... >>> jkeys(file_obj) ['a', 'b', 'c']
>>> jkeys(file_obj,["c"]) ['d', 'f']
>>> from jsonextended.utils import get_test_path >>> path = get_test_path() >>> jkeys(path) ['dir1', 'dir2', 'dir3']
>>> path = get_test_path() >>> jkeys(path, ['dir1','file1'], in_memory=True) ['initial', 'meta', 'optimised', 'units']
-
jsonextended.ejson.
to_dict
(jfile, key_path=None, in_memory=True, ignore_prefix=('.', '_'), parse_decimal=False)[source]¶ input json to dict
Parameters: - jfile (str, file_like or path_like) – if str, must be existing file or folder, if file_like, must have ‘read’ method if path_like, must have ‘iterdir’ method (see pathlib.Path)
- key_path (list[str]) – a list of keys to index into the json before parsing it
- in_memory (bool) – if true reads full json into memory before filtering keys (this is faster but uses more memory)
- ignore_prefix (list[str]) – ignore folders beginning with these prefixes
- parse_decimal (bool) – whether to parse numbers as Decimal instances (retains exact precision)
Examples
>>> from pprint import pformat
>>> from jsonextended.utils import MockPath >>> file_obj = MockPath('test.json',is_file=True, ... content=''' ... { ... "a": 1, ... "b": [1.1,2.1], ... "c": {"d":"e"} ... } ... ''') ...
>>> dstr = pformat(to_dict(file_obj)) >>> print(dstr.replace("u'","'")) {'a': 1, 'b': [1.1, 2.1], 'c': {'d': 'e'}}
>>> dstr = pformat(to_dict(file_obj,parse_decimal=True)) >>> print(dstr.replace("u'","'")) {'a': 1, 'b': [Decimal('1.1'), Decimal('2.1')], 'c': {'d': 'e'}}
>>> str(to_dict(file_obj,["c","d"])) 'e'
>>> from jsonextended.utils import get_test_path >>> path = get_test_path() >>> jdict1 = to_dict(path) >>> pprint(jdict1,depth=2) dir1: dir1_1: {...} file1: {...} file2: {...} dir2: file1: {...} dir3:
>>> jdict2 = to_dict(path,['dir1','file1','initial'],in_memory=False) >>> pprint(jdict2,depth=1) crystallographic: {...} primitive: {...}
jsonextended.example_mockpaths module¶
mock files and folder structure for testing
Examples
>>> jsonfile1
MockFile("dir1/file1.json")
>>> jsonfile2
MockFile("file2.json")
>>> csvfile1
MockFile("dir1/subdir1/file1.csv")
>>> csvfile2
MockFile("dir1/subdir1/file1.literal.csv")
>>> kpfile1
MockFile("dir1/subdir2/subsubdir21/file1.keypair")
>>> print(directory1.to_string(indentlvl=3,file_content=False))
Folder("dir1")
File("file1.json")
Folder("subdir1")
File("file1.csv")
File("file1.literal.csv")
Folder("subdir2")
Folder("subsubdir21")
File("file1.keypair")
>>> print(directory1.to_string(indentlvl=3,file_content=True))
Folder("dir1")
File("file1.json") Contents:
{"key2": {"key3": 4, "key4": 5}, "key1": [1, 2, 3]}
Folder("subdir1")
File("file1.csv") Contents:
# a csv file
header1,header2,header3
val1,val2,val3
val4,val5,val6
val7,val8,val9
File("file1.literal.csv") Contents:
# a csv file with numbers
header1,header2,header3
1,1.1,string1
2,2.2,string2
3,3.3,string3
Folder("subdir2")
Folder("subsubdir21")
File("file1.keypair") Contents:
# a key-pair file
key1 val1
key2 val2
key3 val3
key4 val4
jsonextended.mockpath module¶
-
class
jsonextended.mockpath.
MockPath
(path='root', is_file=False, exists=True, structure=(), content='', parent=None)[source]¶ Bases:
object
a mock path, mimicking pathlib.Path, supporting context open method for read/write
Parameters: Examples
>>> file_obj = MockPath("path/to/test.txt",is_file=True, ... content="line1\nline2\nline3") ... >>> file_obj MockFile("path/to/test.txt") >>> file_obj.name 'test.txt' >>> file_obj.parent MockFolder("path/to") >>> print(str(file_obj)) path/to/test.txt >>> print(file_obj.to_string()) File("test.txt") Contents: line1 line2 line3 >>> file_obj.is_file() True >>> file_obj.is_dir() False >>> with file_obj.open('r') as f: ... print(f.readline().strip()) line1 >>> with file_obj.open('w') as f: ... f.write('newline1\nnewline2') >>> print(file_obj.to_string()) File("test.txt") Contents: newline1 newline2
>>> with file_obj.maketemp() as temp: ... with temp.open() as f: ... print(f.readline().strip()) newline1
>>> dir_obj = MockPath( ... structure=[{'dir1':[{'subdir':[file_obj.copy_path_obj()]},file_obj.copy_path_obj()]}, ... {'dir2':[file_obj.copy_path_obj()]},file_obj.copy_path_obj()] ... ) >>> dir_obj MockFolder("root") >>> dir_obj.name 'root' >>> dir_obj.is_file() False >>> dir_obj.is_dir() True >>> print(dir_obj.to_string()) Folder("root") Folder("dir1") Folder("subdir") File("test.txt") File("test.txt") Folder("dir2") File("test.txt") File("test.txt")
>>> "dir1/test.txt" in dir_obj True
>>> dir_obj["dir1/test.txt"] MockFile("root/dir1/test.txt")
>>> list(dir_obj.iterdir()) [MockFolder("root/dir1"), MockFolder("root/dir2"), MockFile("root/test.txt")]
>>> list(dir_obj.glob("*")) [MockFolder("root/dir1"), MockFolder("root/dir2"), MockFile("root/test.txt")]
>>> list(dir_obj.glob("dir1/*")) [MockFolder("root/dir1/subdir"), MockFile("root/dir1/test.txt")]
>>> list(dir_obj.glob("**")) [MockFolder("root/dir1"), MockFolder("root/dir1/subdir"), MockFolder("root/dir2")]
>>> list(dir_obj.glob("**/*")) [MockFolder("root/dir1"), MockFolder("root/dir1/subdir"), MockFile("root/dir1/subdir/test.txt"), MockFile("root/dir1/test.txt"), MockFolder("root/dir2"), MockFile("root/dir2/test.txt"), MockFile("root/test.txt")]
# >>> list(dir_obj.glob(“**/dir1”)) # [MockFolder(“root/dir1”)]
>>> new = dir_obj.joinpath('dir3') >>> new.mkdir() >>> list(dir_obj.iterdir()) [MockFolder("root/dir1"), MockFolder("root/dir2"), MockFolder("root/dir3"), MockFile("root/test.txt")]
>>> dir_obj.joinpath("test.txt").unlink() >>> list(dir_obj.iterdir()) [MockFolder("root/dir1"), MockFolder("root/dir2"), MockFolder("root/dir3")]
>>> dir_obj.joinpath("dir3").rmdir() >>> list(dir_obj.iterdir()) [MockFolder("root/dir1"), MockFolder("root/dir2")]
>>> print(dir_obj.to_string()) Folder("root") Folder("dir1") Folder("subdir") File("test.txt") File("test.txt") Folder("dir2") File("test.txt")
>>> dir_obj.joinpath("dir1/subdir") MockFolder("root/dir1/subdir")
>>> dir_obj.joinpath("dir1", "subdir") MockFolder("root/dir1/subdir")
>>> new = dir_obj.joinpath("dir1/subdir/other") >>> new MockVirtualPath("root/dir1/subdir/other")
>>> new.touch() >>> new MockFile("root/dir1/subdir/other")
>>> new.unlink() >>> new MockVirtualPath("root/dir1/subdir/other")
>>> new.mkdir() >>> new MockFolder("root/dir1/subdir/other")
>>> newfile = MockPath('newfile.txt', is_file=True) >>> new.copy_from(newfile) >>> print(new.to_string()) Folder("other") File("newfile.txt")
>>> file_obj = MockPath("newfile2.txt",is_file=True, content='test') >>> file_obj.copy_to(new) >>> print(new.to_string()) Folder("other") File("newfile.txt") File("newfile2.txt")
>>> file_obj.name = "newfile3.txt" >>> with file_obj.maketemp() as temp: ... new.copy_from(temp) >>> print(new.to_string()) Folder("other") File("newfile.txt") File("newfile2.txt") File("newfile3.txt")
>>> print(new.copy_path_obj().to_string()) Folder("other") File("newfile.txt") File("newfile2.txt") File("newfile3.txt")
>>> with new.maketemp(getoutput=True) as tempdir: ... tempdir.joinpath("new").mkdir() ... tempdir.joinpath("new/file.txt").touch() >>> print(new.to_string()) Folder("other") Folder("new") File("file.txt") File("newfile.txt") File("newfile2.txt") File("newfile3.txt")
-
children
¶
-
chmod
(mode)[source]¶ Change the mode (permissions) of a file
Parameters: Examples
To make a file executable cur_mode = folder.stat(“exec.sh”).st_mode folder.chmod(“exec.sh”, cur_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH )
-
copy_from
(source)[source]¶ copy from a source to a mock directory
Parameters: source (str or pathlib.Path or MockPath) – any file like object or path to a file
-
copy_to
(target)[source]¶ copy from a mock path to a target
Parameters: target (str, pathlib.Path or MockPath) –
-
file_content
¶
-
maketemp
(getoutput=False, dir=None)[source]¶ make a named temporary file or folder containing the path contents
Parameters: Yields: temppath (path.Path) – path to temporary
-
mkdir
(parents=False)[source]¶ Parameters: - mode –
- parents (bool) – If True, any missing parents of this path are created as needed If False, a missing parent raises FileNotFoundError.
-
name
¶
-
parent
¶
-
path
¶
-
-
jsonextended.mockpath.
colortxt
(text, color=None, on_color=None, attrs=None)[source]¶ Colorize text.
- Available text colors:
- red, green, yellow, blue, magenta, cyan, white.
- Available text highlights:
- on_red, on_green, on_yellow, on_blue, on_magenta, on_cyan, on_white.
- Available attributes:
- bold, dark, underline, blink, reverse, concealed.
Examples
>>> txt = colortxt('Hello, World!', 'red', 'on_grey', ['bold']) >>> print(txt) [1m[40m[31mHello, World![0m
jsonextended.plugins module¶
-
jsonextended.plugins.
decode
(dct, intype='json', raise_error=False)[source]¶ decode dict objects, via decoder plugins, to new type
Parameters: Examples
>>> load_builtin_plugins('decoders') []
>>> from decimal import Decimal >>> decode({'_python_Decimal_':'1.3425345'}) Decimal('1.3425345')
>>> unload_all_plugins()
-
jsonextended.plugins.
encode
(obj, outtype='json', raise_error=False)[source]¶ encode objects, via encoder plugins, to new types
Parameters: Examples
>>> load_builtin_plugins('encoders') []
>>> from decimal import Decimal >>> encode(Decimal('1.3425345')) {'_python_Decimal_': '1.3425345'} >>> encode(Decimal('1.3425345'),outtype='str') '1.3425345'
>>> encode(set([1,2,3,4,4])) {'_python_set_': [1, 2, 3, 4]} >>> encode(set([1,2,3,4,4]),outtype='str') '{1, 2, 3, 4}'
>>> unload_all_plugins()
-
jsonextended.plugins.
load_builtin_plugins
(category=None, overwrite=False)[source]¶ load plugins from builtin directories
Parameters: Examples
>>> from pprint import pprint >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
>>> errors = load_builtin_plugins() >>> errors []
>>> pprint(view_plugins(),width=200) {'decoders': {'decimal.Decimal': 'encode/decode Decimal type', 'fractions.Fraction': 'encode/decode Fraction type', 'numpy.ndarray': 'encode/decode numpy.ndarray', 'pint.Quantity': 'encode/decode pint.Quantity object', 'python.set': 'decode/encode python set'}, 'encoders': {'decimal.Decimal': 'encode/decode Decimal type', 'fractions.Fraction': 'encode/decode Fraction type', 'numpy.ndarray': 'encode/decode numpy.ndarray', 'pint.Quantity': 'encode/decode pint.Quantity object', 'python.set': 'decode/encode python set'}, 'parsers': {'csv.basic': 'read *.csv delimited file with headers to {header:[column_values]}', 'csv.literal': 'read *.literal.csv delimited files with headers to {header:column_values}, with number strings converted to int/float', 'hdf5.read': 'read *.hdf5 (in read mode) files using h5py', 'ipynb': 'read Jupyter Notebooks', 'json.basic': 'read *.json files using json.load', 'keypair': "read *.keypair, where each line should be; '<key> <pair>'", 'yaml.ruamel': 'read *.yaml files using ruamel.yaml'}}
>>> unload_all_plugins()
-
jsonextended.plugins.
load_plugin_classes
(classes, category=None, overwrite=False)[source]¶ load plugins from class objects
Parameters: Examples
>>> from pprint import pprint >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
>>> class DecoderPlugin(object): ... plugin_name = 'example' ... plugin_descript = 'a decoder for dicts containing _example_ key' ... dict_signature = ('_example_',) ... >>> errors = load_plugin_classes([DecoderPlugin])
>>> pprint(view_plugins()) {'decoders': {'example': 'a decoder for dicts containing _example_ key'}, 'encoders': {}, 'parsers': {}}
>>> unload_all_plugins()
-
jsonextended.plugins.
load_plugins_dir
(path, category=None, overwrite=False)[source]¶ load plugins from a directory
Parameters:
-
jsonextended.plugins.
parse
(fpath, **kwargs)[source]¶ parse file contents, via parser plugins, to dict like object NB: the longest file regex will be used from plugins
Parameters: - fpath (file_like) – string, object with ‘open’ and ‘name’ attributes, or object with ‘readline’ and ‘name’ attributes
- kwargs – to pass to parser plugin
Examples
>>> load_builtin_plugins('parsers') []
>>> from pprint import pformat
>>> json_file = StringIO('{"a":[1,2,3.4]}') >>> json_file.name = 'test.json'
>>> dct = parse(json_file) >>> print(pformat(dct).replace("u'","'")) {'a': [1, 2, 3.4]}
>>> reset = json_file.seek(0) >>> from decimal import Decimal >>> dct = parse(json_file, parse_float=Decimal,other=1) >>> print(pformat(dct).replace("u'","'")) {'a': [1, 2, Decimal('3.4')]}
>>> class NewParser(object): ... plugin_name = 'example' ... plugin_descript = 'loads test.json files' ... file_regex = 'test.json' ... def read_file(self, file_obj, **kwargs): ... return {'example':1} >>> load_plugin_classes([NewParser],'parsers') [] >>> reset = json_file.seek(0) >>> parse(json_file) {'example': 1}
>>> unload_all_plugins()
-
jsonextended.plugins.
parser_available
(fpath)[source]¶ test if parser plugin available for fpath
Examples
>>> load_builtin_plugins('parsers') [] >>> test_file = StringIO('{"a":[1,2,3.4]}') >>> test_file.name = 'test.json' >>> parser_available(test_file) True >>> test_file.name = 'test.other' >>> parser_available(test_file) False
>>> unload_all_plugins()
-
jsonextended.plugins.
plugins_context
(classes, category=None)[source]¶ context manager to load plugin class(es) then unload on exit
Parameters: Examples
>>> from pprint import pprint
>>> class DecoderPlugin(object): ... plugin_name = 'example' ... plugin_descript = 'a decoder for dicts containing _example_ key' ... dict_signature = ('_example_',) ...
>>> with plugins_context([DecoderPlugin]): ... pprint(view_plugins()) {'decoders': {'example': 'a decoder for dicts containing _example_ key'}, 'encoders': {}, 'parsers': {}}
>>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
-
jsonextended.plugins.
unload_all_plugins
(category=None)[source]¶ clear all plugins
Parameters: category (None or str) – if str, apply for single plugin category Examples
>>> from pprint import pprint >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
>>> class DecoderPlugin(object): ... plugin_name = 'example' ... plugin_descript = 'a decoder for dicts containing _example_ key' ... dict_signature = ('_example_',) ... >>> errors = load_plugin_classes([DecoderPlugin])
>>> pprint(view_plugins()) {'decoders': {'example': 'a decoder for dicts containing _example_ key'}, 'encoders': {}, 'parsers': {}}
>>> unload_all_plugins() >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
-
jsonextended.plugins.
unload_plugin
(name, category=None)[source]¶ remove single plugin
Parameters: Examples
>>> from pprint import pprint >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
>>> class DecoderPlugin(object): ... plugin_name = 'example' ... plugin_descript = 'a decoder for dicts containing _example_ key' ... dict_signature = ('_example_',) ... >>> errors = load_plugin_classes([DecoderPlugin],category='decoders')
>>> pprint(view_plugins()) {'decoders': {'example': 'a decoder for dicts containing _example_ key'}, 'encoders': {}, 'parsers': {}}
>>> unload_plugin('example','decoders') >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
-
jsonextended.plugins.
view_interfaces
(category=None)[source]¶ return a view of the plugin minimal class attribute interface(s)
Parameters: category (None or str) – if str, apply for single plugin category Examples
>>> from pprint import pprint >>> pprint(view_interfaces()) {'decoders': ['plugin_name', 'plugin_descript', 'dict_signature'], 'encoders': ['plugin_name', 'plugin_descript', 'objclass'], 'parsers': ['plugin_name', 'plugin_descript', 'file_regex', 'read_file']}
-
jsonextended.plugins.
view_plugins
(category=None)[source]¶ return a view of the loaded plugin names and descriptions
Parameters: category (None or str) – if str, apply for single plugin category Examples
>>> from pprint import pprint >>> pprint(view_plugins()) {'decoders': {}, 'encoders': {}, 'parsers': {}}
>>> class DecoderPlugin(object): ... plugin_name = 'example' ... plugin_descript = 'a decoder for dicts containing _example_ key' ... dict_signature = ('_example_',) ... >>> errors = load_plugin_classes([DecoderPlugin])
>>> pprint(view_plugins()) {'decoders': {'example': 'a decoder for dicts containing _example_ key'}, 'encoders': {}, 'parsers': {}}
>>> view_plugins('decoders') {'example': 'a decoder for dicts containing _example_ key'}
>>> unload_all_plugins()
jsonextended.utils module¶
-
jsonextended.utils.
class_to_str
(obj)[source]¶ get class string from object
Examples
>>> class_to_str(list).split('.')[1] 'list'
-
jsonextended.utils.
get_data_path
(data, module, check_exists=True)[source]¶ return a directory path to data within a module
Parameters: data (str or list[str]) – file name or list of sub-directories and file name (e.g. [‘lammps’,’data.txt’])
-
jsonextended.utils.
get_test_path
()[source]¶ returns test path object
Examples
>>> path = get_test_path() >>> path.name '_example_data_folder'
-
jsonextended.utils.
load_memit
()[source]¶ load memory usage ipython magic, require memory_profiler package to be installed
to get usage: %memit?
Author: Vlad Niculae <vlad@vene.ro> Makes use of memory_profiler from Fabian Pedregosa available at https://github.com/fabianp/memory_profiler
-
jsonextended.utils.
memory_usage
()[source]¶ return memory usage of python process in MB
from http://fa.bianp.net/blog/2013/different-ways-to-get-memory-consumption-or-lessons-learned-from-memory_profiler/ psutil is quicker
>>> isinstance(memory_usage(),float) True
Module contents¶
a module to extend the python json package functionality;
- decoding/encoding between the on-disk JSON structure
and in-memory nested dictionary structure, including:
- treating path structures, with nested directories and multiple .json files, as a single json.
- on-disk indexing of the json structure (using the ijson package)
- extended data type serialisation (numpy.ndarray, Decimals, pint.Quantities,…)
- viewing and manipulating the nested dictionaries:
- enhanced pretty printer
- Javascript rendered, expandable tree in the Jupyter Notebook
- filter, merge, flatten, unflatten functions
- Units schema concept to apply and convert physical units (using the pint package)
- Parser abstract class for dealing with converting other file formats to JSON
Notes
On-disk indexing of the json structure, before reading into memory, to reduce memory overhead when dealing with large json structures/files (using the ijson package), e.g.
path = get_test_path() %memit jdict1 = to_dict(path,[‘dir1’,’file2’,’meta’],in_memory=True) maximum of 3: 12.242188 MB per loop
%memit jdict1 = to_dict(path,[‘dir1’,’file2’,’meta’],in_memory=False) maximum of 3: 6.996094 MB per loop
Examples
>>> from jsonextended import ejson, edict, utils
>>> path = utils.get_test_path()
>>> path.is_dir()
True
>>> ejson.jkeys(path)
['dir1', 'dir2', 'dir3']
>>> jdict1 = ejson.to_dict(path)
>>> edict.pprint(jdict1,depth=2)
dir1:
dir1_1: {...}
file1: {...}
file2: {...}
dir2:
file1: {...}
dir3:
>>> jdict2 = ejson.to_dict(path,['dir1','file1'])
>>> edict.pprint(jdict2,depth=1)
initial: {...}
meta: {...}
optimised: {...}
units: {...}
>>> filtered = edict.filter_keys(jdict2,['vol*'],use_wildcards=True)
>>> edict.pprint(filtered)
initial:
crystallographic:
volume: 924.62752781
primitive:
volume: 462.313764
optimised:
crystallographic:
volume: 1063.98960509
primitive:
volume: 531.994803
>>> edict.pprint(edict.flatten(filtered))
(initial, crystallographic, volume): 924.62752781
(initial, primitive, volume): 462.313764
(optimised, crystallographic, volume): 1063.98960509
(optimised, primitive, volume): 531.994803
Releases¶
v0.7.0 - Improved edict.filter_keyvals and added edict.filter_keyfuncs¶
- added “OR”/”AND” logic parameter
- removed errors parameter from edict.filter_keyvals (instead use edict.filter_keyfuncs)
v0.7.1 - added deep_copy option to edict functions¶
v0.7.2 - corrected filter_keyvals for dict_like siblings¶
- added additional tests
v0.7.3 - added plugin class contextmanager¶
v0.7.4 - added fraction.Fraction encoder plugin¶
v0.7.6 - bug fix for mock path read context¶
v0.7.7 - added allow_other_keys for plugin.decoders¶
0.7.8 - add sdist to pypi (required for conda-forge)¶
v0.7.9 - add files required by setup.py to manifest (for sdist)¶
v0.7.10 - Improve testing¶
- change from nose to pytest
- add testing for flake8 linting on travis
- add testing for document creation on travis
v0.6.0 - Improvements to LazyLoad¶
in edict.LazyLoad
- changed ignore_prefixes -> ignore_regexes for greater flexibility
- added logging (at debug level) for each file parsed (helpful for longer loading times)
- added better exception handling for file parsing (to help with debugging)
added edict.dump in order to better mirror standard json module (its exactly the same as edict.to_json)
v0.6.1 - added remove_lkey to edict.apply and parse_errors to LazyLoad¶
v0.6.2 - added .yaml parser plugin¶
v0.6.3 - edict.remove_paths; allow list of path keys¶
v0.6.4 - edict.merge added list_of_dicts option¶
v0.5.0 - Major Improvements to MockPath¶
split off into separate package
paths relative to base
index
handle maketemp of folder
v0.5.1 - API Documentation update¶
v0.5.2 - Minor improvement¶
v0.5.3 - Minor Bug Fix¶
v0.5.4 - Minor improvement¶
- added byte decoding to mock path write class
v0.5.5 - Minor improvements of MockPath¶
added stat and chmod (dummy) methods
v0.5.6 - Minor improvements of MockPath¶
v0.5.7 - Reorder Documentation of Versions¶
v0.4.0 - Apply functions¶
- added apply and combine_apply functions
- refactored edict to avoid deepcopy recursion (flatten, unflatten)
v0.4.1 - General functionality improvement¶
- added more support for list of dict structures
- option to keep siblings when filtering by keyval
- added wildcard option to remove_keys and value plus/minus error to filter_keyvals
v0.4.2 - Added ReadTheDoc Site¶
v0.4.3 - minor bug fixes and improvements¶
v0.4.4 - Addition of Diff Evaluator¶
edict.diff, which can optionally use numpy.allclose to assess arrays of floating point numbers