We start with loading a sample database. Our sample database is derived from the dataset of all employees of the city of Chicago (source).
In [1]:
include("../citydb.jl")
using RBT
setdb(citydb)
Out[1]:
We can execute a query using @query()
command:
In [2]:
@query(6*(3+4))
Out[2]:
In [3]:
@query(department.name)
Out[3]:
Find the names of all employees.
In [4]:
@query(department.employee.name)
Out[4]:
We are not longer restricted by the hierarchical structure of the database, so we can query employees directly.
In [5]:
@query(employee.name)
Out[5]:
We can traverse the schema in any direction, for instance, from employees to their departments.
In [6]:
@query(employee.department.name)
Out[6]:
Show the list of all salaries.
In [7]:
@query(employee.salary)
Out[7]:
If the traversal ends at an entity class, an array of records is generated.
In [8]:
@query(employee)
Out[8]:
Which fields are selected depends on the path to the class.
In [9]:
@query(department.employee)
Out[9]:
In [10]:
@query(count(department))
Out[10]:
Find the number of employees for each department.
In [11]:
@query(department.count(employee))
Out[11]:
Find the total number of employees.
In [12]:
@query(count(department.employee))
Out[12]:
Again, we can query employee
directly.
In [13]:
@query(count(employee))
Out[13]:
Find the top salary among all employees.
In [14]:
@query(max(employee.salary))
Out[14]:
Find the maximum number of employees per department.
In [15]:
@query(max(department.count(employee)))
Out[15]:
In [16]:
@query(department:select(name,count(employee)))
Out[16]:
The :select
notation is a syntax sugar for regular function call where the first argument is placed before the function name (postfix notation).
In [17]:
@query(select(department,name,count(employee)))
Out[17]:
It is easy to add new columns to the output. Let us add the top salary per department.
In [18]:
@query(
department
:select(
name,
count(employee),
max(employee.salary)))
Out[18]:
In [19]:
@query(
employee
:filter(salary>200000)
:select(name,surname,position,salary))
Out[19]:
You can apply :filter()
on any selected column.
In [20]:
@query(
employee
:select(name,surname,position,salary)
:filter(salary>200000))
Out[20]:
Find the number of employees with salary in the range from \$100k to \$200k.
In [21]:
@query(
employee
:filter((salary>100000)&(salary<=200000))
:count)
Out[21]:
Find the departments with more than 1000 employees.
In [22]:
@query(
department
:filter(count(employee)>1000)
.name)
Out[22]:
Find the number of departments with more than 1000 employees.
In [23]:
@query(
count(
department
:filter(count(employee)>1000)))
Out[23]:
For each department, find the number of employees with salary higher than $100k.
In [24]:
@query(
department
:select(
name,
count(employee:filter(salary>100000))))
Out[24]:
For each department with the number of employees less than 1000, find the employees with salary higher than $125k.
In [25]:
@query(
department
:filter(count(employee)<1000)
:select(
name,
employee
:filter(salary>125000)
:select(name,surname,position)))
Out[25]:
In [26]:
@query(department.name:sort)
Out[26]:
We can also specify the attribute by which the elements of the array are to be sorted.
Show the employees sorted by salary.
In [27]:
@query(employee:sort(salary))
Out[27]:
Use :desc
indicator to reverse the order.
In [28]:
@query(employee:sort(salary:desc))
Out[28]:
It is possible to specify several sorting keys.
In [29]:
@query(
employee
:sort(
salary:desc,
surname:asc,
name:asc))
Out[29]:
:sort
can be used together with :select
and :filter
.
In [30]:
@query(
department
:select(name, size => count(employee))
:filter(size>1000)
:sort(size:desc))
Out[30]:
Use :define
to name a commonly used expression.
In [31]:
@query(
department
:define(size => count(employee))
:select(name, size)
:filter(size>1000)
:sort(size:desc))
Out[31]:
In [32]:
@query(first(employee))
Out[32]:
The name of the first employee.
In [33]:
@query(first(employee).name)
Out[33]:
The department with the largest number of employees.
In [34]:
@query(
department
:select(name, size => count(employee))
:sort(size:desc)
:first)
Out[34]:
Same query without :sort
.
The department with the largest number of employees.
In [35]:
@query(
department
:select(name, size => count(employee))
:first(size))
Out[35]:
Last employee.
In [36]:
@query(employee:last)
Out[36]:
The department with the largest number of employees.
In [37]:
@query(
department
:select(name, size => count(employee))
:sort(size)
:last)
Out[37]:
Same query could be written without :sort
.
The department with the largest number of employees.
In [38]:
@query(
department
:select(name, size => count(employee))
:last(size:desc))
Out[38]:
Show first 5 employees.
In [39]:
@query(employee:take(5))
Out[39]:
Skip first 10 employees, show next 5.
In [40]:
@query(employee:skip(10):take(5))
Out[40]:
Show last ten employees.
In [41]:
@query(employee:skip(-10))
Out[41]:
Show approximately half of employees.
In [42]:
@query(employee:take(count(employee)/2))
Out[42]:
Reverse the order of departments.
In [43]:
@query(department:reverse)
Out[43]:
In [44]:
@query(department:select(id,name))
Out[44]:
Use :get
combinator to find the record by its identity.
In [45]:
@query(department:get(5))
Out[45]:
If a record is not found, null
value is returned.
In [46]:
@query(department:get(-1))
Out[46]:
You can use brackets instead of :get
.
In [47]:
@query(
department[5]
:select(id, name, count(employee)))
Out[47]:
Show all employees of a selected department.
In [48]:
@query(department[5].employee)
Out[48]:
In [49]:
@query(
employee
:filter(salary>managed_by.salary)
:select(name, surname, position, managed_by, salary-managed_by.salary))
Out[49]:
For all employees in a certain department, list their seniors and the number of their subordinates.
In [50]:
@query(
department[26].employee
:select(
name,
surname,
position,
connect(managed_by),
count(connect(manages))))
Out[50]:
List employees of a certain department in hierarchical order.
In [51]:
@query(
department[26].employee
:sort_connect(managed_by)
:select(
depth(managed_by),
name,
surname,
position))
Out[51]:
In [52]:
@query(employee.position:unique)
Out[52]:
Find the number of distinct positions for each department.
In [53]:
@query(
department
:select(
name,
count(unique(employee.position)),
count(employee)))
Out[53]:
We can also list distinct positions using :group
combinator. With each position, we get a list of employees having this position.
In [54]:
@query(employee:group(position))
Out[54]:
For each row generated by employee:group(position)
, combinator employee
will give you employees that have this position.
For each position, find the number of employees.
In [55]:
@query(
employee
:group(position)
:select(position, size => count(employee))
:sort(size:desc))
Out[55]:
Find positions provided by no less than 5 departments.
In [56]:
@query(
employee
:group(position)
:define(department => unique(employee.department))
:filter(count(department)>=5)
:select(position, department)
:sort(count(department):desc))
Out[56]:
Find the popular names of Chicago employees.
In [57]:
@query(
employee
:group(name)
:select(name, size => count(employee))
:sort(size:desc))
Out[57]:
Find the top salary by the first name, but only if there are at least 10 employees having this name.
In [58]:
@query(
employee
:group(name)
:filter(count(employee)>=10)
:select(name, max_salary => max(employee.salary))
:sort(max_salary:desc))
Out[58]:
Find the number of employees for each department and salary bracket.
In [59]:
@query(
employee
:group(department, salary_bracket => salary/10000*10000 :desc)
:select(department, salary_bracket, salary_bracket+9999, count(employee)))
Out[59]:
To generate totals on each dimension, use :group_cube
.
Find the number of employees for each department and salary bracket, including totals.
In [60]:
@query(
employee
:group_cube(department, salary_bracket => salary/10000*10000 :desc)
:select(department, salary_bracket, salary_bracket+9999, count(employee)))
Out[60]:
Add :dataframe
to present this data in tabular form.
In [61]:
@query(
employee
:group_cube(
department,
salary_bracket => salary/10000*10000 :desc)
:select(
department,
low => salary_bracket,
high => salary_bracket+9999,
size => count(employee))
:dataframe)
Out[61]:
You can specify dimensions separately using :partition
.
Find the number of positions, the number of employees and the highest salary for the first 3 departments.
In [62]:
@query(
employee
:partition(department:take(3))
:select(department.name, count(unique(employee.position)), count(employee), max(employee.salary)))
Out[62]:
Similar to :group_cube
, :partition_cube
adds totals.
Find the numbers of positions and employees, the highest salary and the most popular position for the first 3 departments, and include the totals.
In [63]:
@query(
employee
:partition_cube(department:take(3))
:select(
department.name,
num_pos => count(unique(employee.position)),
num_empl => count(employee),
max_salary => max(employee.salary),
pop_position => employee:group(position):first(count(employee)).position)
:dataframe)
Out[63]:
You can use an array constructor or range()
combinator to specify the dimensions.
In [64]:
@query(range(0, 60000, max(employee.salary)))
Out[64]:
For the given departments, employee's names and salary brackets, find the number of employees, the number of different positions and the most popular position.
In [65]:
@query(
employee
:define(salary_bracket => salary/60000*60000)
:partition(
department:take(3),
name => ["ANTHONY", "BRIAN"],
salary_bracket => range(0, 60000, max(employee.salary)))
:select(
dept => department.name,
name,
low => salary_bracket,
high => salary_bracket+59999,
pop_position => employee:group(position):first(count(employee)).position,
num_pos => count(unique(employee.position)),
num_empl => count(employee))
:dataframe)
Out[65]:
In [66]:
@query(department:json)
Out[66]:
Selector items become fields of the JSON dictionary.
In [67]:
@query(
department
:select(
name,
size => count(employee),
head => employee:first(salary))
:json)
Out[67]:
You can pass a list of output fields to the json
combinator.
In [68]:
@query(
department
:json(
name,
size => count(employee),
head => employee:first(salary)))
Out[68]:
Use :dataframe
combinator to generate DataFrame
output.
In [69]:
@query(employee:dataframe)
Out[69]:
In [70]:
@query(
department
:select(
name,
size => count(employee),
max_salary => max(employee.salary))
:dataframe)
Out[70]:
You can pass a list of output fields to the dataframe
combinator.
In [71]:
@query(
department
:dataframe(
name,
size => count(employee),
max_salary => max(employee.salary)))
Out[71]:
In [72]:
@query(
mix(a => range(2,1,10),
b => range(2,1,10),
c => range(2,1,10))
:filter((a <= b) & (b <= c))
:select(a, b, c, (a*b)*c))
Out[72]:
All pairs of departments with approximately equal number of employees.
In [73]:
@query(
mix(department, department)
:filter((left.id != right.id) & (left.count(employee)/10 == right.count(employee)/10))
:select(left.name, left.count(employee), right.name, right.count(employee)))
Out[73]:
Use pack()
combinator to generate a tagged union.
In [74]:
@query(
pack(
a => range(1,1,5),
z => range(95,1,99)))
Out[74]:
Values generated by pack()
don't have to have the same type.
In [75]:
@query(pack(employee, department))
Out[75]:
You can extract tagged values using combinators named after the tags.
In [76]:
@query(pack(employee, department).employee)
Out[76]:
In [77]:
@query(pack(employee, department).department)
Out[77]:
In [78]:
@query(pack(employee, department):select(employee.position, department.name))
Out[78]:
Use unlink
to create an unconditional link to an entity class.
Find the employees with salary within 50% of the top salary.
In [79]:
@query(
employee
:take(500)
:filter(salary > max(unlink(employee).salary)/2))
Out[79]:
Use link
to create a link on an arbitrary condition.
For a given employee, find all his namesakes.
In [80]:
@query(
employee[10]
:define(namesake => link((left.id!=right.id)&(left.name==right.name), employee))
:select(name, count(namesake), namesake:take(3)))
Out[80]:
In [81]:
@query(
employee:filter((position==POSITION) & (name==NAME)),
POSITION="POLICE OFFICER",
NAME="CHARLES")
Out[81]:
Find all departments bigger than the given size.
In [82]:
@query(
department
:filter(count(employee)>SIZE)
:select(name, count(employee)-SIZE),
SIZE=1000)
Out[82]:
Find all employees in the given departments.
In [83]:
@query(
employee:filter(department.name in DEPTS),
DEPTS=["POLICE", "FIRE"])
Out[83]:
Parameters could also be calculated dynamically using combinator given
.
Find the highest paid employee.
In [84]:
@query(
employee
:filter(salary==MAX_SALARY)
:given(MAX_SALARY => max(employee.salary)))
Out[84]:
Find the highest paid employee in each department.
In [85]:
@query(
department
:select(
name,
employee
:filter(salary==MAX_SALARY)
:given(MAX_SALARY => max(employee.salary))))
Out[85]:
In [86]:
@query(
department
:dataframe(
name,
past_names => before.name))
Out[86]:
Similarly, combinator after
refers to all the subsequent values.
In [87]:
@query(
department
:dataframe(
name,
next_name => first(after).name))
Out[87]:
You can use the before
combinator to number output records.
In [88]:
@query(
department
:select(1+count(before), name))
Out[88]:
A variant of before
called and_before
includes the current record in the set. You can also use and_before
to calculate running totals.
In [89]:
@query(
department
:define(size => count(employee))
:dataframe(
name,
size,
total => sum(and_before.size)))
Out[89]:
Combinator and_around
let you refer to the full set of the output values.
Find the departments with the largest number of employees.
In [90]:
@query(
department
:define(size => count(employee))
:filter(size == max(and_around.size))
:select(name, size))
Out[90]:
Combinator around
can also give you output values that have the same property as the current value.
For each employee in a certain department, find how much does their salary differ from the top salary for their position.
In [91]:
@query(
employee
:filter(department.name == DEPT)
:dataframe(
name,
surname,
position,
salary,
salary_diff => max(and_around(position).salary)-salary),
DEPT="TREASURER")
Out[91]:
By default, context extends to all values produced by the combinator while executing the query. You can limit the scope of the context using the frame
combinator.
Find the highest paid employee in each department.
In [92]:
@query(
department
:select(
name,
employee
:filter(salary==max(and_around.salary))
:frame))
Out[92]:
In [93]:
q1 = RBT.@prepare(department.employee.name)
Out[93]:
In [94]:
q2 = RBT.@prepare(count(department))
Out[94]:
In [95]:
q3 = RBT.@prepare(department:select(name,count(employee)))
Out[95]:
In [96]:
q4 = RBT.@prepare(count(employee:filter((salary>100000)&(salary<200000))))
Out[96]:
You can also prepare a query with parameters.
In [97]:
q5 = RBT.@prepare(X*(Y*Z), X=Int, Y=Nullable{Int}, Z=Vector{Int})
Out[97]:
In [98]:
q6 = RBT.@prepare(employee:filter((name == NAME) & (salary > MIN_SALARY)), NAME=UTF8String, MIN_SALARY=Int)
Out[98]:
Queries know their parameters.
In [99]:
RBT.params(q1)
Out[99]:
In [100]:
RBT.params(q5)
Out[100]:
In [101]:
RBT.params(q6)
Out[101]:
To execute a query, call the compiled query as a function.
In [102]:
q1()
Out[102]:
In [103]:
q2()
Out[103]:
In [104]:
q3()
Out[104]:
In [105]:
q4()
Out[105]:
In [106]:
q5(X=5, Y=Nullable(4), Z=[3,2,1])
Out[106]:
In [107]:
q6(NAME="CHARLES", MIN_SALARY=100000)
Out[107]: