Exploring Custom Revival with JSON.parse


In [2]:
var Immutable = require('immutable')
var _ = require('lodash')

Revival on Parse

JSON.parse takes an extra argument called a reviver:

JSON.parse(text[, reviver])

The reviver accepts two parameters, key and value and returns the intended value. The key will either be a text key on Objects or numbers for when the value is in an Array.

Let's walk through some sample code to check this out.


In [3]:
// Classic JSON.parse
JSON.parse('{"a": 2, "b": { "name": "dave" }}')


Out[3]:
{ a: 2, b: { name: 'dave' } }

In [4]:
function reviver(key, value) {
    if(key === 'name') {
        return value + " senior";
    }
    return value
}

JSON.parse('{"a": 2, "b": { "name": "dave" }}', reviver)


Out[4]:
{ a: 2, b: { name: 'dave senior' } }

This means you can use this to change values based on a key, though you won't know the nested path of the overall JSON object.

Since the string is (expected to be) JSON, there are only two types which are not immutable: Array and Object. You can use this to your advantage to create frozen or Immutable.js objects while parsing.


In [5]:
JSON.parse('{"a": 2, "b": { "name": "dave" }}', (k, v) => Object.freeze(v))


Out[5]:
{ a: 2, b: { name: 'dave' } }

In [6]:
function immutableReviver(key, value) {
    if (Array.isArray(value)) {
        return Immutable.List(value);
    }

    if (typeof value === 'object') {
        return Immutable.Map(value)
    }
    return value;
}

Since it seemed handy enough, I put immutable-reviver on npm. We'll just use the version written here for now though.


In [7]:
revived = JSON.parse('{"a": 2, "b": { "name": "dave" }}', immutableReviver)


Out[7]:
Map { "a": 2, "b": Map { "name": "dave" } }

In [8]:
revived.getIn(['b', 'name'])


Out[8]:
'dave'

The reason I started looking into this was because I was trying to see if I could optimize loading of notebooks in nteract. We currently rely on a strategy that goes like:

notebook = JSON.parse(rawNotebook)
immutableNotebook = Immutable.fromJS(notebook)

ourNotebook = immutableNotebook.map(...).map(...)... // A series of transformations to create our in-memory representation

These transformations are mostly to turn notebook cells from this:

{
  "metadata": {
    "collapsed": false,
    "outputExpanded": false
  },
  "cell_type": "markdown",
  "source": [
    "# Outputs you can update by name\n",
    "\n",
    "This notebook demonstrates the new name-based display functionality in the notebook. Previously, notebooks could only attach output to the cell that was currently being executed:\n",
    "\n"
  ]
}

into:

{
  "metadata": {
    "collapsed": false,
    "outputExpanded": false
  },
  "cell_type": "markdown",
  "source": "# Outputs you can update by name\n\nThis notebook demonstrates the new name-based display functionality in the notebook. Previously, notebooks could only attach output to the cell that was currently being executed:\n\n"
}

This multi-line string format, introduced by Jupyter, is to accomodate diffing of notebooks in tools like git and GitHub. It's applied to source on cells as well as some output types.

We can set up a reviver that handles all the keys that are most likely to have multi-line strings. We'll start with those that are media types that we know end up being encoded as an array of strings.


In [9]:
var multilineStringMimetypes = new Set([
    'application/javascript',
    'text/html',
    'text/markdown',
    'text/latex',
    'image/svg+xml',
    'image/gif',
    'image/png',
    'image/jpeg',
    'application/pdf',
    'text/plain',
]);

function immutableNBReviver(key, value) {
    if (Array.isArray(value)) {
        if(multilineStringMimetypes.has(key)) {
            return value.join('')
        }
        return Immutable.List(value);
    }

    if (typeof value === 'object') {
        return Immutable.Map(value)
    }
    return value;
}

We can also set up a "greedy" reviver that will also convert source and text fields. The primary problem with this though, because of how JSON.parse works is that we have no idea if it's a key in a cell where we expect, part of someone else's JSON payload, or in metadata.


In [10]:
var specialKeys = new Set([
    'application/javascript',
    'text/html',
    'text/markdown',
    'text/latex',
    'image/svg+xml',
    'image/gif',
    'image/png',
    'image/jpeg',
    'application/pdf',
    'text/plain',
    'source',
    'text',
]);

function immutableGreedyReviver(key, value) {
    if (Array.isArray(value)) {
        if(specialKeys.has(key)) {
            return value.join('')
        }
        return Immutable.List(value);
    }

    if (typeof value === 'object') {
        return Immutable.Map(value)
    }
    return value;
}

Our runtime harnesses

To evaluate the speed at which we can revive our objects, we'll set up a little testing harness.


In [11]:
// Some logger that uses process.hrtime that I ripped off Stack Overflow, since we want to use timing in a way that we can't with console.time

[ a, o, ms, s, log ] = ( function * () {
    yield * [
        ( process.hrtime )(),
        process.hrtime,
        ms => ( ( ms[ 0 ] * 1e9 + ms[ 1 ] ) / 1000000 ),
        s  => s / 1000,
        () => {
            const f = o( a ), msf = ms( f ), sf = s( msf );
            return { a, o: f, ms: msf, s: sf };
        }
    ];
} )();


Out[11]:
{}

In [12]:
// Calculate the milliseconds it takes to run f
function measure(f) {
  start = log()
  f()
  end = log()
  return end.ms - start.ms  
}

// measure the function run n times, return the mean
function runTrials(f, n=1000) {
    values = []
    for(var ii=0; ii < n; ii++) {
        values.push(measure(f))
    }
    return values.reduce((a, b) => a + b, 0)/n
}

With our harness all set up, we can run through all the notebooks we have locally to see how they perform with different revivers.


In [13]:
notebooks = require('glob').sync('./*.ipynb')


Out[13]:
[ './altair.ipynb',
  './display-updates.ipynb',
  './download-stats.ipynb',
  './geojson.ipynb',
  './immutable-revival.ipynb',
  './intro.ipynb',
  './model-debug.ipynb',
  './pandas-to-geojson.ipynb',
  './plotly.ipynb',
  './plotlyr.ipynb',
  './table-with-schema.ipynb' ]

In [14]:
for(var notebookPath of notebooks) {
    console.log("\n ----- ", path.basename(notebookPath))
    raw = fs.readFileSync(notebookPath)
    
    var tests = [
        { name: 'straight JSON.parse', f: () => { JSON.parse(raw) } },
        { name: 'Object.freeze', f: () => { JSON.parse(raw, (k, v) => Object.freeze(v)) } },
        { name: 'basic Immutable', f: () => { JSON.parse(raw, immutableReviver) } },
        { name: 'immutable notebook', f: () => { JSON.parse(raw, immutableNBReviver) } },
        { name: 'immutable greedy nb', f: () => { JSON.parse(raw, immutableGreedyReviver) } },
        // { name: 'fromJS', f: () => { JSON.parse(raw, (k, v) => Immutable.fromJS(v)) } },
        // { name: 'current commutable way', f: () => { commutable.fromJS(JSON.parse(raw)) } },
    ]
    
    for(var test of tests) {
        mean = runTrials(test.f, 100)
        console.log(_.padEnd(test.name, 30), mean)
    }
    

}


 -----  altair.ipynb
straight JSON.parse            0.9920628000000307
Object.freeze                  2.2261327600000187
basic Immutable                6.339475390000043
immutable notebook             5.987929800000066
immutable greedy nb            6.041059049999958

 -----  display-updates.ipynb
straight JSON.parse            0.055342559999971855
Object.freeze                  0.1522439999999642
basic Immutable                0.32696781000005104
immutable notebook             0.2565404299999773
immutable greedy nb            0.21772927000005438

 -----  download-stats.ipynb
straight JSON.parse            0.03000843000002533
Object.freeze                  0.0882049800000368
basic Immutable                0.12239362000002074
immutable notebook             0.120487959999964
immutable greedy nb            0.12786773999993784

 -----  geojson.ipynb
straight JSON.parse            0.06639499000002616
Object.freeze                  0.17166477999984636
basic Immutable                0.21922646999997597
immutable notebook             0.24737535999996907
immutable greedy nb            0.24788046999994548

 -----  immutable-revival.ipynb
straight JSON.parse            0.11106276999998954
Object.freeze                  0.35120148000010887
basic Immutable                0.4511028199999055
immutable notebook             0.4249068999998963
immutable greedy nb            0.43641603000007306

 -----  intro.ipynb
straight JSON.parse            0.04580465999993976
Object.freeze                  0.0828917600000932
basic Immutable                0.11927991999998994
immutable notebook             0.11377050000004602
immutable greedy nb            0.14635414000011224

 -----  model-debug.ipynb
straight JSON.parse            0.015867650000109278
Object.freeze                  0.04522049000008337
basic Immutable                0.07146109000008437
immutable notebook             0.07224731999998767
immutable greedy nb            0.06531165999993391

 -----  pandas-to-geojson.ipynb
straight JSON.parse            0.0914843699999983
Object.freeze                  0.21983236000010947
basic Immutable                0.3106511400001364
immutable notebook             0.30525319999995193
immutable greedy nb            0.30837015999995854

 -----  plotly.ipynb
straight JSON.parse            0.8451852499999404
Object.freeze                  1.9863980899999842
basic Immutable                2.762597209999967
immutable notebook             2.3932162999998763
immutable greedy nb            2.5105548600000476

 -----  plotlyr.ipynb
straight JSON.parse            0.010073599999996076
Object.freeze                  0.034625850000065836
basic Immutable                0.039010990000060704
immutable notebook             0.039971539999951344
immutable greedy nb            0.11599857000006523

 -----  table-with-schema.ipynb
straight JSON.parse            0.4168697699999211
Object.freeze                  0.8328423399998428
basic Immutable                1.1854772000000593
immutable notebook             1.141829869999965
immutable greedy nb            1.2159551399999327

Evaluating revivers for notebook loading.

Within nteract we are inevitably going to end up creating an immutable structure. These measurements only make sense in the context of running both the initial JSON.parse followed by the transformations. To give it a rough guess, I'll only compare a few I can evaluate.


In [17]:
for(var notebookPath of notebooks) {
    console.log("\n ----- ", path.basename(notebookPath))
    raw = fs.readFileSync(notebookPath)
    
    var tests = [
        { name: 'straight JSON.parse baseline', f: () => { JSON.parse(raw) } },
        { name: 'Object.freeze baseline', f: () => { JSON.parse(raw, (k,v) => Object.freeze(v)) } },
        { name: 'immutable greedy nb', f: () => { JSON.parse(raw, immutableGreedyReviver) } },
    ]
    
    for(var test of tests) {
        mean = runTrials(test.f, 100)
        console.log(_.padEnd(test.name, 50), mean.toString().slice(0,10), 'ms')
    }
}


 -----  altair.ipynb
straight JSON.parse baseline                       0.88235720 ms
Object.freeze baseline                             2.25522840 ms
immutable greedy nb                                5.73692367 ms

 -----  display-updates.ipynb
straight JSON.parse baseline                       0.05300443 ms
Object.freeze baseline                             0.14575380 ms
immutable greedy nb                                0.34971011 ms

 -----  download-stats.ipynb
straight JSON.parse baseline                       0.02906631 ms
Object.freeze baseline                             0.08603889 ms
immutable greedy nb                                0.10953930 ms

 -----  geojson.ipynb
straight JSON.parse baseline                       0.09626379 ms
Object.freeze baseline                             0.15678716 ms
immutable greedy nb                                0.22850692 ms

 -----  immutable-revival.ipynb
straight JSON.parse baseline                       0.11066660 ms
Object.freeze baseline                             0.34582383 ms
immutable greedy nb                                0.40816922 ms

 -----  intro.ipynb
straight JSON.parse baseline                       0.03264234 ms
Object.freeze baseline                             0.08193051 ms
immutable greedy nb                                0.13175342 ms

 -----  model-debug.ipynb
straight JSON.parse baseline                       0.01731088 ms
Object.freeze baseline                             0.04996562 ms
immutable greedy nb                                0.06617834 ms

 -----  pandas-to-geojson.ipynb
straight JSON.parse baseline                       0.07676871 ms
Object.freeze baseline                             0.23245053 ms
immutable greedy nb                                0.28704811 ms

 -----  plotly.ipynb
straight JSON.parse baseline                       0.65487637 ms
Object.freeze baseline                             2.10578171 ms
immutable greedy nb                                2.50347865 ms

 -----  plotlyr.ipynb
straight JSON.parse baseline                       0.01031352 ms
Object.freeze baseline                             0.03151152 ms
immutable greedy nb                                0.03889758 ms

 -----  table-with-schema.ipynb
straight JSON.parse baseline                       0.44009007 ms
Object.freeze baseline                             0.82348861 ms
immutable greedy nb                                1.10231335 ms

Since these are in milliseconds and the difference is not much, it seems like maybe this doesn't need to be optimized. In the case of the altair notebook, which has a pretty big JSON structure inside (and only one!), perhaps it would make sense if some of our structure is frozen objects (don't force vega payloads to be Immutable Maps).

 -----  altair.ipynb
straight JSON.parse baseline                       1.10996391 ms
Object.freeze baseline                             2.29745900 ms
straight JSON.parse then commutable conversion     6.84918417 ms
immutable greedy nb                                5.85418076 ms