This tutorial introduces the JSONiq language, which declaratively manipulates JSON data. It is an interactive notebook that works with Rumble running as a server in the background.
If you are running this notebook in Jupyter already, we assume you have already installed Python, Jupyter as well as probably Spark. If not, the easiest is to use a distribution such as Anaconda, which installs everything for you on any operating system.
Then, all you need to do is download this notebook locally, then run
jupyter notebook JSONiq-tutorial.ipynb
in the same directory to open it.
One more thing to know: Spark only works with Java 8. If you use Java 11 or 14, you will get an error message. It is possible to install several versions of Java on the same machine, and if such is the case, you simply need to switch your JAVA_HOME environment variable to the Java 8 home.
For this notebook to work, we need two more very simple steps to prepare Rumble. You can execute them on your command line (note that you need to open a new command line window, so that jupyter can continue to run meanwhile).
Download rumble with:
wget https://github.com/RumbleDB/rumble/releases/download/v1.6.3/spark-rumble-1.6.3.jar
(you can also download it manually by using this link in your browser.
Then, you can start a rumble server by simply executing:
spark-submit spark-rumble-1.6.3.jar --server yes --port 9090
You should leave it running for all the duration of this tutorial. Then, you can interrupt it with Ctrl-C.
If the port is already taken, you can change 9090 to another number (e.g., 8001, 8081, 9091, etc). If you do so, make sure to make the same change in the cell assigning to the Python server variable further down.
The setup described above will run Spark and Rumble on your laptop. However, if you wish instead to play with a larger cluster (like Amazone EMR or Azure), it is straightforward! A ready-to-use Spark cluster takes just a few minutes to set up with a few easy clicks. When you have done so, just run the above two commands on the master of the cluster. When you connect (with SSH), you need to add -L 9090:localhost:9090
to the ssh command to establish an SSH tunnel right to your laptop. Then, you you can use this notebook in exactly the same way as if it were local.
In order for JSONiq to run successfully, you need to execute the following cell (without changing it):
In [46]:
import requests
import json
import time
from IPython.core.magic import register_line_cell_magic
@register_line_cell_magic
def rumble(line, cell=None):
if cell is None:
data = line
else:
data = cell
start = time.time()
response = json.loads(requests.post(server, data=data).text)
end = time.time()
print("Took: %s ms" % (end - start))
if 'warning' in response:
print(json.dumps(response['warning']))
if 'values' in response:
for e in response['values']:
print(json.dumps(e))
elif 'error-message' in response:
return response['error-message']
else:
return response
As well as this one (where you need to change the port from 9090 to another value if you used a different --port parameter):
In [47]:
server='http://localhost:9090/jsoniq'
Now we are all set!
As explained on the official JSON Web site, JSON is a lightweight data-interchange format designed for humans as well as for computers. It supports as values:
JSONiq provides declarative querying and updating capabilities on JSON data.
JSONiq is based on XQuery, which is a W3C standard (like XML and HTML). XQuery is a very powerful declarative language that originally manipulates XML data, but it turns out that it is also a very good fit for manipulating JSON natively. JSONiq, since it extends XQuery, is a very powerful general-purpose declarative programming language. Our experience is that, for the same task, you will probably write about 80% less code compared to imperative languages like JavaScript, Python or Ruby. Additionally, you get the benefits of strong type checking without actually having to write type declarations. Here is an appetizer before we start the tutorial from scratch.
In [48]:
%%rumble
let $stores :=
[
{ "store number" : 1, "state" : "MA" },
{ "store number" : 2, "state" : "MA" },
{ "store number" : 3, "state" : "CA" },
{ "store number" : 4, "state" : "CA" }
]
let $sales := [
{ "product" : "broiler", "store number" : 1, "quantity" : 20 },
{ "product" : "toaster", "store number" : 2, "quantity" : 100 },
{ "product" : "toaster", "store number" : 2, "quantity" : 50 },
{ "product" : "toaster", "store number" : 3, "quantity" : 50 },
{ "product" : "blender", "store number" : 3, "quantity" : 100 },
{ "product" : "blender", "store number" : 3, "quantity" : 150 },
{ "product" : "socks", "store number" : 1, "quantity" : 500 },
{ "product" : "socks", "store number" : 2, "quantity" : 10 },
{ "product" : "shirt", "store number" : 3, "quantity" : 10 }
]
let $join :=
for $store in $stores[], $sale in $sales[]
where $store."store number" = $sale."store number"
return {
"nb" : $store."store number",
"state" : $store.state,
"sold" : $sale.product
}
return [$join]
The first thing you need to know is that a well-formed JSON document is a JSONiq expression as well. This means that you can copy-and-paste any JSON document into a query. The following are JSONiq queries that are "idempotent" (they just output themselves):
In [49]:
%%rumble
{ "pi" : 3.14, "sq2" : 1.4 }
In [50]:
%%rumble
[ 2, 3, 5, 7, 11, 13 ]
In [51]:
%%rumble
{
"operations" : [
{ "binary" : [ "and", "or"] },
{ "unary" : ["not"] }
],
"bits" : [
0, 1
]
}
In [52]:
%%rumble
[ { "Question" : "Ultimate" }, ["Life", "the universe", "and everything"] ]
This works with objects, arrays (even nested), strings, numbers, booleans, null.
It also works the other way round: if your query outputs an object or an array, you can use it as a JSON document. JSONiq is a declarative language. This means that you only need to say what you want - the compiler will take care of the how.
In the above queries, you are basically saying: I want to output this JSON content, and here it is.
In [53]:
%%rumble
"Hello, World!"
Not surprisingly, it outputs the string "Hello, World!".
Okay, so, now, you might be thinking: "What is the use of this language if it just outputs what I put in?" Of course, JSONiq can more than that. And still in a declarative way. Here is how it works with numbers:
In [54]:
%%rumble
2 + 2
In [55]:
%%rumble
(38 + 2) div 2 + 11 * 2
(mind the division operator which is the "div" keyword. The slash operator has different semantics).
Like JSON, JSONiq works with decimals and doubles:
In [56]:
%%rumble
6.022e23 * 42
In [57]:
%%rumble
true and false
In [58]:
%%rumble
(true or false) and (false or true)
The unary not is also available:
In [59]:
%%rumble
not true
In [60]:
%%rumble
concat("Hello ", "Captain ", "Kirk")
In [61]:
%%rumble
substring("Mister Spock", 8, 5)
JSONiq comes up with a rich string function library out of the box, inherited from its base language. These functions are listed here (actually, you will find many more for numbers, dates, etc).
In [62]:
%%rumble
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
In [63]:
%%rumble
1, true, 4.2e1, "Life"
The "to" operator is very convenient, too:
In [64]:
%%rumble
(1 to 100)
Some functions even work on sequences:
In [65]:
%%rumble
sum(1 to 100)
In [66]:
%%rumble
string-join(("These", "are", "some", "words"), "-")
In [67]:
%%rumble
count(10 to 20)
In [68]:
%%rumble
avg(1 to 100)
Unlike arrays, sequences are flat. The sequence (3) is identical to the integer 3, and (1, (2, 3)) is identical to (1, 2, 3).
In [69]:
%%rumble
let $x := "Bearing 3 1 4 Mark 5. "
return concat($x, "Engage!")
In [70]:
%%rumble
let $x := ("Kirk", "Picard", "Sisko")
return string-join($x, " and ")
You can bind as many variables as you want:
In [71]:
%%rumble
let $x := 1
let $y := $x * 2
let $z := $y + $x
return ($x, $y, $z)
and even reuse the same name to hide formerly declared variables:
In [72]:
%%rumble
let $x := 1
let $x := $x + 2
let $x := $x + 3
return $x
In [73]:
%%rumble
for $i in 1 to 10
return $i * 2
More interestingly, you can combine fors and lets like so:
In [74]:
%%rumble
let $sequence := 1 to 10
for $value in $sequence
let $square := $value * 2
return $square
and even filter out some values:
In [75]:
%%rumble
let $sequence := 1 to 10
for $value in $sequence
let $square := $value * 2
where $square < 10
return $square
Note that you can only iterate over sequences, not arrays. To iterate over an array, you can obtain the sequence of its values with the [] operator, like so:
In [76]:
%%rumble
[1, 2, 3][]
In [77]:
%%rumble
for $x in 1 to 10
return if ($x < 5) then $x
else -$x
Note that the else clause is required - however, it can be the empty sequence () which is often when you need if only the then clause is relevant to you.
In [78]:
%%rumble
[ 1 to 10 ]
Or you can dynamically compute the value of object pairs (or their key):
In [79]:
%%rumble
{
"Greeting" : (let $d := "Mister Spock"
return concat("Hello, ", $d)),
"Farewell" : string-join(("Live", "long", "and", "prosper"),
" ")
}
You can dynamically generate object singletons (with a single pair):
In [80]:
%%rumble
{ concat("Integer ", 2) : 2 * 2 }
and then merge lots of them into a new object with the {| |} notation:
In [81]:
%%rumble
{|
for $i in 1 to 10
return { concat("Square of ", $i) : $i * $i }
|}
Up to now, you have learnt how to compose expressions so as to do some computations and to build objects and arrays. It also works the other way round: if you have some JSON data, you can access it and navigate. All you need to know is: JSONiq views an array as an ordered list of values, an object as a set of name/value pairs
In [82]:
%%rumble
let $person := {
"first name" : "Sarah",
"age" : 13,
"gender" : "female",
"friends" : [ "Jim", "Mary", "Jennifer"]
}
return $person."first name"
You can also ask for all keys in an object:
In [83]:
%%rumble
let $person := {
"name" : "Sarah",
"age" : 13,
"gender" : "female",
"friends" : [ "Jim", "Mary", "Jennifer"]
}
return { "keys" : [ keys($person)] }
In [84]:
%%rumble
let $friends := [ "Jim", "Mary", "Jennifer"]
return $friends[[1+1]]
It is also possible to get the size of an array:
In [85]:
%%rumble
let $person := {
"name" : "Sarah",
"age" : 13,
"gender" : "female",
"friends" : [ "Jim", "Mary", "Jennifer"]
}
return { "how many friends" : size($person.friends) }
Finally, the [] operator returns all elements in an array, as a sequence:
In [86]:
%%rumble
let $person := {
"name" : "Sarah",
"age" : 13,
"gender" : "female",
"friends" : [ "Jim", "Mary", "Jennifer"]
}
return $person.friends[]
In [87]:
%%rumble
let $stores :=
[
{ "store number" : 1, "state" : "MA" },
{ "store number" : 2, "state" : "MA" },
{ "store number" : 3, "state" : "CA" },
{ "store number" : 4, "state" : "CA" }
]
let $sales := [
{ "product" : "broiler", "store number" : 1, "quantity" : 20 },
{ "product" : "toaster", "store number" : 2, "quantity" : 100 },
{ "product" : "toaster", "store number" : 2, "quantity" : 50 },
{ "product" : "toaster", "store number" : 3, "quantity" : 50 },
{ "product" : "blender", "store number" : 3, "quantity" : 100 },
{ "product" : "blender", "store number" : 3, "quantity" : 150 },
{ "product" : "socks", "store number" : 1, "quantity" : 500 },
{ "product" : "socks", "store number" : 2, "quantity" : 10 },
{ "product" : "shirt", "store number" : 3, "quantity" : 10 }
]
let $join :=
for $store in $stores[], $sale in $sales[]
where $store."store number" = $sale."store number"
return {
"nb" : $store."store number",
"state" : $store.state,
"sold" : $sale.product
}
return [$join]
In [44]:
%%rumble
json-file("put the path to a JSON lines file here")
Out[44]:
In [45]:
%%rumble
json-doc("put the path to a small JSON file with an object spread over multiple lines here")
Out[45]:
It is also possible to get JSON content with an HTTP request, or by parsing it from a string. The EXPath http-client module (described in the Zorba documentation) allows you to make HTTP requests, and the jn:parse-json() function allows you to use the body as an object or an array. Rumble does not support HTTP requests (from JSONiq) yet.
JSONiq supports JSON updates. You can declaratively update your JSON data. JSONiq provides updating expressions. The list of updates that is eventually output by your program is then applied to your JSON data.
copy $people := {
"John" : { "status" : "single" },
"Mary" : { "status" : "single" } }
modify (replace value of json $people.John.status with "married",
replace value of json $people.Mary.status with "married")
return $people
-> { "John" : { "status" : "married" }, "Mary" : { "status" : "married" } }
Other updates are insertion into an object or an array, replacement of a value in an object or an array, deletion in an object or an array, renaming an object pair, appending to an array. This is documented on jsoniq.org.
Updates are not supported by Rumble yet, but Zorba supports them.
Here are a couple of more highlights:
The complete JSONiq specification is available on (http://www.jsoniq.org/)[http://www.jsoniq.org/docs/JSONiq/webhelp/index.html]
In [ ]: