Exploring Custom Revival with JSON.parse



In [2]:

    
var Immutable = require('immutable')
var _ = require('lodash')
var commutable = require('commutable')

Revival on Parse

JSON.parse takes an extra argument called a reviver:

JSON.parse(text[, reviver])

The reviver accepts two parameters, key and value and returns the intended value. The key will either be a text key on Objects or numbers for when the value is in an Array.

Let's walk through some sample code to check this out.



In [3]:

    
// Classic JSON.parse
JSON.parse('{"a": 2, "b": { "name": "dave" }}')









    Out[3]:





{ a: 2, b: { name: 'dave' } }



In [4]:

    
function reviver(key, value) {
    if(key === 'name') {
        return value + " senior";
    }
    return value
}

JSON.parse('{"a": 2, "b": { "name": "dave" }}', reviver)









    Out[4]:





{ a: 2, b: { name: 'dave senior' } }

This means you can use this to change values based on a key, though you won't know the nested path of the overall JSON object.

Since the string is (expected to be) JSON, there are only two types which are not immutable: Array and Object. You can use this to your advantage to create frozen or Immutable.js objects while parsing.



In [5]:

    
JSON.parse('{"a": 2, "b": { "name": "dave" }}', (k, v) => Object.freeze(v))









    Out[5]:





{ a: 2, b: { name: 'dave' } }



In [6]:

    
function immutableReviver(key, value) {
    if (Array.isArray(value)) {
        return Immutable.List(value);
    }

    if (typeof value === 'object') {
        return Immutable.Map(value)
    }
    return value;
}

Since it seemed handy enough, I put immutable-reviver on npm. We'll just use the version written here for now though.



In [7]:

    
revived = JSON.parse('{"a": 2, "b": { "name": "dave" }}', immutableReviver)









    Out[7]:





Map { "a": 2, "b": Map { "name": "dave" } }



In [8]:

    
revived.getIn(['b', 'name'])









    Out[8]:





'dave'

The reason I started looking into this was because I was trying to see if I could optimize loading of notebooks in nteract. We currently rely on a strategy that goes like:

notebook = JSON.parse(rawNotebook)
immutableNotebook = Immutable.fromJS(notebook)

ourNotebook = immutableNotebook.map(...).map(...)... // A series of transformations to create our in-memory representation

These transformations are mostly to turn notebook cells from this:

{
  "metadata": {
    "collapsed": false,
    "outputExpanded": false
  },
  "cell_type": "markdown",
  "source": [
    "# Outputs you can update by name\n",
    "\n",
    "This notebook demonstrates the new name-based display functionality in the notebook. Previously, notebooks could only attach output to the cell that was currently being executed:\n",
    "\n"
  ]
}

into:

{
  "metadata": {
    "collapsed": false,
    "outputExpanded": false
  },
  "cell_type": "markdown",
  "source": "# Outputs you can update by name\n\nThis notebook demonstrates the new name-based display functionality in the notebook. Previously, notebooks could only attach output to the cell that was currently being executed:\n\n"
}

This multi-line string format, introduced by Jupyter, is to accomodate diffing of notebooks in tools like git and GitHub. It's applied to source on cells as well as some output types.

We can set up a reviver that handles all the keys that are most likely to have multi-line strings. We'll start with those that are media types that we know end up being encoded as an array of strings.



In [9]:

    
var multilineStringMimetypes = new Set([
    'application/javascript',
    'text/html',
    'text/markdown',
    'text/latex',
    'image/svg+xml',
    'image/gif',
    'image/png',
    'image/jpeg',
    'application/pdf',
    'text/plain',
]);

function immutableNBReviver(key, value) {
    if (Array.isArray(value)) {
        if(multilineStringMimetypes.has(key)) {
            return value.join('')
        }
        return Immutable.List(value);
    }

    if (typeof value === 'object') {
        return Immutable.Map(value)
    }
    return value;
}

We can also set up a "greedy" reviver that will also convert source and text fields. The primary problem with this though, because of how JSON.parse works is that we have no idea if it's a key in a cell where we expect, part of someone else's JSON payload, or in metadata.



In [10]:

    
var specialKeys = new Set([
    'application/javascript',
    'text/html',
    'text/markdown',
    'text/latex',
    'image/svg+xml',
    'image/gif',
    'image/png',
    'image/jpeg',
    'application/pdf',
    'text/plain',
    'source',
    'text',
]);

function immutableGreedyReviver(key, value) {
    if (Array.isArray(value)) {
        if(specialKeys.has(key)) {
            return value.join('')
        }
        return Immutable.List(value);
    }

    if (typeof value === 'object') {
        return Immutable.Map(value)
    }
    return value;
}

Our runtime harnesses

To evaluate the speed at which we can revive our objects, we'll set up a little testing harness.



In [11]:

    
// Some logger that uses process.hrtime that I ripped off Stack Overflow, since we want to use timing in a way that we can't with console.time

[ a, o, ms, s, log ] = ( function * () {
    yield * [
        ( process.hrtime )(),
        process.hrtime,
        ms => ( ( ms[ 0 ] * 1e9 + ms[ 1 ] ) / 1000000 ),
        s  => s / 1000,
        () => {
            const f = o( a ), msf = ms( f ), sf = s( msf );
            return { a, o: f, ms: msf, s: sf };
        }
    ];
} )();









    Out[11]:





{}



In [12]:

    
// Calculate the milliseconds it takes to run f
function measure(f) {
  start = log()
  f()
  end = log()
  return end.ms - start.ms  
}

// measure the function run n times, return the mean
function runTrials(f, n=1000) {
    values = []
    for(var ii=0; ii < n; ii++) {
        values.push(measure(f))
    }
    return values.reduce((a, b) => a + b, 0)/n
}

With our harness all set up, we can run through all the notebooks we have locally to see how they perform with different revivers.



In [13]:

    
notebooks = require('glob').sync('./*.ipynb')









    Out[13]:





[ './altair.ipynb',
  './display-updates.ipynb',
  './download-stats.ipynb',
  './geojson.ipynb',
  './immutable-revival.ipynb',
  './intro.ipynb',
  './pandas-to-geojson.ipynb',
  './plotly.ipynb',
  './plotlyr.ipynb' ]



In [14]:

    
for(var notebookPath of notebooks) {
    console.log("\n ----- ", path.basename(notebookPath))
    raw = fs.readFileSync(notebookPath)
    
    var tests = [
        { name: 'straight JSON.parse', f: () => { JSON.parse(raw) } },
        { name: 'Object.freeze', f: () => { JSON.parse(raw, (k, v) => Object.freeze(v)) } },
        { name: 'basic Immutable', f: () => { JSON.parse(raw, immutableReviver) } },
        { name: 'immutable notebook', f: () => { JSON.parse(raw, immutableNBReviver) } },
        { name: 'immutable greedy nb', f: () => { JSON.parse(raw, immutableGreedyReviver) } },
        // { name: 'fromJS', f: () => { JSON.parse(raw, (k, v) => Immutable.fromJS(v)) } },
        // { name: 'current commutable way', f: () => { commutable.fromJS(JSON.parse(raw)) } },
    ]
    
    for(var test of tests) {
        mean = runTrials(test.f, 100)
        console.log(_.padEnd(test.name, 30), mean)
    }
    

}









    



 -----  altair.ipynb
straight JSON.parse            1.0114663799999994
Object.freeze                  2.2710730899999976
basic Immutable                6.953669760000012
immutable notebook             6.303867540000012
immutable greedy nb            5.885446750000042

 -----  display-updates.ipynb
straight JSON.parse            0.0527826800000139
Object.freeze                  0.29057334000000085
basic Immutable                0.3702223100000037
immutable notebook             0.30329410999999706
immutable greedy nb            0.31812728999998396

 -----  download-stats.ipynb
straight JSON.parse            0.03511225999999169
Object.freeze                  0.09761285000000954
basic Immutable                0.1575103000000081
immutable notebook             0.151353409999997
immutable greedy nb            0.15477838999998766

 -----  geojson.ipynb
straight JSON.parse            0.0641011300000082
Object.freeze                  0.12834901999997328
basic Immutable                0.24726937000000362
immutable notebook             0.26744406999999226
immutable greedy nb            0.26291936999999505

 -----  immutable-revival.ipynb
straight JSON.parse            0.11935431000000335
Object.freeze                  0.3496589899999526
basic Immutable                0.5662236200000053
immutable notebook             0.5391540500000201
immutable greedy nb            0.5187027099999887

 -----  intro.ipynb
straight JSON.parse            0.03365566999998919
Object.freeze                  0.09442575999999918
basic Immutable                0.16246287000001758
immutable notebook             0.17279669999999442
immutable greedy nb            0.1607633399999804

 -----  pandas-to-geojson.ipynb
straight JSON.parse            0.0900477100000171
Object.freeze                  0.24277752999995755
basic Immutable                0.4287934400000267
immutable notebook             0.4577435899999682
immutable greedy nb            0.43433019000000056

 -----  plotly.ipynb
straight JSON.parse            0.7967762399999765
Object.freeze                  1.9283778299999768
basic Immutable                2.4856349999999656
immutable notebook             2.506966860000093
immutable greedy nb            2.5777538899999763

 -----  plotlyr.ipynb
straight JSON.parse            0.010880689999967217
Object.freeze                  0.03211826999997356
basic Immutable                0.05272114000006695
immutable notebook             0.05727690000000621
immutable greedy nb            0.12374351999998907

Evaluating revivers for notebook loading.

Within nteract we are inevitably going to end up creating an immutable structure. These measurements only make sense in the context of running both the initial JSON.parse followed by the transformations. To give it a rough guess, I'll only compare a few I can evaluate.



In [15]:

    
for(var notebookPath of notebooks) {
    console.log("\n ----- ", path.basename(notebookPath))
    raw = fs.readFileSync(notebookPath)
    
    var tests = [
        { name: 'straight JSON.parse baseline', f: () => { JSON.parse(raw) } },
        { name: 'Object.freeze baseline', f: () => { JSON.parse(raw, (k,v) => Object.freeze(v)) } },
        { name: 'straight JSON.parse then commutable conversion', f: () => { commutable.fromJS(JSON.parse(raw)) } },
        { name: 'immutable greedy nb', f: () => { JSON.parse(raw, immutableGreedyReviver) } },
    ]
    
    for(var test of tests) {
        mean = runTrials(test.f, 100)
        console.log(_.padEnd(test.name, 50), mean.toString().slice(0,10), 'ms')
    }
}









    



 -----  altair.ipynb
straight JSON.parse baseline                       0.81931705 ms
Object.freeze baseline                             2.00812913 ms
straight JSON.parse then commutable conversion     6.75599129 ms
immutable greedy nb                                6.17820571 ms

 -----  display-updates.ipynb
straight JSON.parse baseline                       0.09677104 ms
Object.freeze baseline                             0.23135841 ms
straight JSON.parse then commutable conversion     0.86403770 ms
immutable greedy nb                                0.42838139 ms

 -----  download-stats.ipynb
straight JSON.parse baseline                       0.03012636 ms
Object.freeze baseline                             0.14278228 ms
straight JSON.parse then commutable conversion     0.30873458 ms
immutable greedy nb                                0.23934701 ms

 -----  geojson.ipynb
straight JSON.parse baseline                       0.06409356 ms
Object.freeze baseline                             0.13810864 ms
straight JSON.parse then commutable conversion     0.45703981 ms
immutable greedy nb                                0.37553522 ms

 -----  immutable-revival.ipynb
straight JSON.parse baseline                       0.13904703 ms
Object.freeze baseline                             0.44273339 ms
straight JSON.parse then commutable conversion     1.25046544 ms
immutable greedy nb                                0.59089099 ms

 -----  intro.ipynb
straight JSON.parse baseline                       0.03240080 ms
Object.freeze baseline                             0.13016685 ms
straight JSON.parse then commutable conversion     0.30284560 ms
immutable greedy nb                                0.27437384 ms

 -----  pandas-to-geojson.ipynb
straight JSON.parse baseline                       0.09197440 ms
Object.freeze baseline                             0.30980088 ms
straight JSON.parse then commutable conversion     0.86631161 ms
immutable greedy nb                                0.62287899 ms

 -----  plotly.ipynb
straight JSON.parse baseline                       0.75267735 ms
Object.freeze baseline                             2.05540894 ms
straight JSON.parse then commutable conversion     2.12752016 ms
immutable greedy nb                                2.60920478 ms

 -----  plotlyr.ipynb
straight JSON.parse baseline                       0.01124605 ms
Object.freeze baseline                             0.04089100 ms
straight JSON.parse then commutable conversion     0.10297941 ms
immutable greedy nb                                0.12461002 ms

Since these are in milliseconds and the difference is not much, it seems like maybe this doesn't need to be optimized. In the case of the altair notebook, which has a pretty big JSON structure inside (and only one!), perhaps it would make sense if some of our structure is frozen objects (don't force vega payloads to be Immutable Maps).

 -----  altair.ipynb
straight JSON.parse baseline                       1.10996391 ms
Object.freeze baseline                             2.29745900 ms
straight JSON.parse then commutable conversion     6.84918417 ms
immutable greedy nb                                5.85418076 ms



In [ ]:



In [ ]: