SQL vs NoSql vs NewSql

This notebook is used to run code and generate graphs

Config


In [1]:
num_of_runs = 10

Imports


In [2]:
import subprocess
import os
import time
import re
import postgres_import as pos_i
import mysql_import as my_i
import memsql_import as mem_i
import random

Helper function to generate a bar plot graph


In [3]:
import numpy as np
from matplotlib import pyplot as plt

def bar_plot(data, items, colours, title):
    fig = plt.figure()
    fig.set_size_inches(15, 7)

    width = .45
    ind = np.arange(len(data))
    
    plt.bar(0.3+ind, data, width=width, color=colours)
    plt.xticks(0.3+ind, items)
    plt.title(title)

    fig.autofmt_xdate()

    plt.show()

Helper function to run commands inside a shell


In [4]:
def run_command(command, verbose=True, save_mem=False):
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
    returncode = process.wait()
    output = ""
    if not(save_mem):
        output = process.stdout.read().decode('utf-8', 'replace').replace("\r", "").split("\n")
        if verbose: 
            for line in output:
                print (line)
    return output

def run_command2(command, verbose=True, save_mem=False):
    process = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
    returncode = process.wait()
    output = ""
    if not(save_mem):
        output = process.stdout.read().decode('utf-8', 'replace').replace("\r", "").split("\n")
        if verbose: 
            for line in output:
                print (line)
    return output

Starting all the docker containers to check if it's all right


In [5]:
_ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose up -d')


Creating network "docker_default" with the default driver
Creating nuodb
Creating mysql
Creating orientdb
Creating postgres
Creating cassandra
Creating redis
Creating voltdb
Creating cockroach
Creating trafodion
Creating mongo
Creating memsql
Creating neo4j

Stopping all the containers

Test functions will start and stop the containers many times in order to accurately measure times.


In [6]:
_ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose down')


Stopping trafodion ... 
Stopping mysql ... 
Stopping neo4j ... 
Stopping memsql ... 
Stopping cockroach ... 
Stopping redis ... 
Stopping mongo ... 
Stopping postgres ... 
Stopping orientdb ... 
Stopping nuodb ... 
Stopping cassandra ... 
Stopping voltdb ... 
Stopping nuodb ... doneStopping redis ... doneStopping cassandra ... doneStopping mongo ... doneStopping cockroach ... doneStopping neo4j ... doneStopping orientdb ... doneStopping memsql ... doneStopping postgres ... doneStopping trafodion ... doneStopping mysql ... doneStopping voltdb ... doneRemoving trafodion ... 
Removing mysql ... 
Removing neo4j ... 
Removing memsql ... 
Removing cockroach ... 
Removing redis ... 
Removing mongo ... 
Removing postgres ... 
Removing orientdb ... 
Removing nuodb ... 
Removing cassandra ... 
Removing voltdb ... 
Removing redis ... doneRemoving orientdb ... doneRemoving postgres ... doneRemoving neo4j ... doneRemoving nuodb ... doneRemoving cockroach ... doneRemoving memsql ... doneRemoving trafodion ... doneRemoving cassandra ... doneRemoving mongo ... doneRemoving mysql ... doneRemoving voltdb ... doneRemoving network docker_default

Definitions


In [7]:
postgres_container = "postgres"
containers = ["mongo", "redis", "neo4j", "orientdb", "cassandra", postgres_container, "mysql", "memsql", "voltdb", "cockroach"]
colours = ["lightgreen", "red", "deepskyblue", "orange", "skyblue", "lightsteelblue", "steelblue", "cornflowerblue", "firebrick", "darkgreen"]

Helper functions to run tests and measure the time taken


In [8]:
def exec_and_time(container, command, verbose=True, save_mem=False):
    start = time.time()
    if container == postgres_container or container == "mysql" or container == "memsql":
        command[0](os.path.join(os.path.abspath('./Docker/shared_data'),command[1]))
    elif container == "cockroach":
        _ = run_command('docker exec --privileged ' + container + ' sh -c "' + command[0] + '"', verbose, save_mem)
        _ = run_command('docker exec --privileged --user postgres '+ postgres_container +' sh -c "' + command[1] + '"', verbose, save_mem)
    elif container == "cassandra":
        _ = run_command2('docker exec --privileged ' + container + ' '+ command[0] +' -c "' + command[1] + '"', verbose, save_mem)
    else: 
        _ = run_command('docker exec --privileged ' + container + ' sh -c "' + command + '"', verbose, save_mem)
    end = time.time()
    return end - start

In [9]:
def average_time(commands, containers, iterations, verbose=True, import_data=False):
    times = [0] * len(containers)
    peak_times = [0] * len(containers)
    for j in range(0, iterations):
        if verbose:
            print("iteration " + str(j+1))
        _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose up -d', verbose=False)
        if verbose:
            print("waiting...")
        time.sleep(25) 
        for i in range(0, len(containers)):
            if verbose:
                print(containers[i])
                
            coackroach_ip = ""
            
            app1 = data_import_commands[i]
            app2 = commands[i]
            
            if containers[i] == "cockroach":
                ip = run_command('docker exec cockroach ip route', verbose=False)
                ip = ip[len(ip)-2].split(" ")
                ip = ip[len(ip)-2]
                cockroach_ip = ip
                #print(cockroach_ip)
                data_import_commands[i][1] = data_import_commands[i][1].replace("{{cockroach_ip}}", cockroach_ip)
                commands[i][1].replace("{{coackroach_ip}}", coackroach_ip)
            if import_data:
                exec_and_time(containers[i], data_import_commands[i], verbose=False, save_mem=True)
                
            data_import_commands[i] = app1
            commands[i] = app2
            
            time_taken = exec_and_time(containers[i], commands[i], verbose=False, save_mem=True)
            times[i] += time_taken
            if peak_times[i] < time_taken:
                peak_times[i] = time_taken
        if j < iterations-1:
            _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose down', verbose=False)
    for i in range(0, len(containers)):
        times[i] = times[i]/iterations
        print(containers[i] + ": " + str(times[i]))
    return (times, peak_times)

Average bulk import time

Import commands


In [10]:
data_import_commands = [
    "mongoimport --db testing_db --collection customers --type csv --headerline --file /shared_data/mock_data_10000_rows.csv && mongo testing_db --eval 'db.customers.createIndex({Id:1});'",
    "cat /shared_data/mock_data_10000_rows2.csv | redis-cli --pipe",
    "cp /shared_data/mock_data_10000_neo.csv /var/lib/neo4j/import && cat /shared_scripts/import_mock_10000.cypher | /var/lib/neo4j/bin/cypher-shell",
    "/orientdb/bin/oetl.sh /shared_scripts/import_orient_10000.json",
    ["bash", "cqlsh -f /shared_scripts/import_mock_10000.cql"],
    [pos_i.sql_import, "mock_data_10000_rows_utf8.csv"], #postgres
    [my_i.sql_import, "mock_data_10000_rows_utf8.csv"], #mysql
    [mem_i.sql_import, "mock_data_10000_rows_utf8.csv"], #memsql
    "cat /shared_scripts/import_voltdb.sql | sqlcmd && /opt/voltdb/bin/csvloader customers -f /shared_data/mock_data_10000_rows.csv",
    ["./cockroach sql --insecure --execute='create database if not exists mock;'", "psql -p 26257 -h {{cockroach_ip}} -d mock -U root < /shared_data/postgres_dump.sql"]
]

Iterating


In [11]:
(import_times_avg, import_times_peak) = average_time(data_import_commands, containers, num_of_runs)


iteration 1
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 2
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 3
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 4
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 5
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 6
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 7
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 8
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 9
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 10
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
mongo: 1.3779650449752807
redis: 0.2029155969619751
neo4j: 6.259813523292541
orientdb: 18.990342760086058
cassandra: 4.345867133140564
postgres: 4.852600312232971
mysql: 1.5185468912124633
memsql: 3.768800735473633
voltdb: 5.780721855163574
cockroach: 1.5266023635864259

Results


In [12]:
bar_plot(import_times_avg, containers, colours, "Average bulk import time")


Query execution time and throughput

Helper functions, modified from the previous ones

And also dependant from them.


In [13]:
def exec_and_time2(container, command, verbose=True, save_mem=False, long_output=False):
    start = time.time()
    if container == "cassandra" or long_output:
        _ = run_command2(command, verbose, save_mem)
    else:
        _ = run_command(command, verbose, save_mem)
    end = time.time()
    return end - start

In [14]:
def average_time2(commands, containers, iterations, verbose=True, import_data=False):
    times = [0] * len(containers)
    peak_times = [0] * len(containers)
    for j in range(0, iterations):
        if verbose:
            print("iteration " + str(j+1))
        _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose up -d', verbose=False)
        if verbose:
            print("waiting...")
        time.sleep(25) 
        for i in range(0, len(containers)):
            if verbose:
                print(containers[i])
                
            coackroach_ip = ""
            
            app1 = data_import_commands[i]
            
            if containers[i] == "cockroach":
                ip = run_command('docker exec cockroach ip route', verbose=False)
                ip = ip[len(ip)-2].split(" ")
                ip = ip[len(ip)-2]
                cockroach_ip = ip
                #print(cockroach_ip)
                data_import_commands[i][1] = data_import_commands[i][1].replace("{{cockroach_ip}}", cockroach_ip)
                #print(data_import_commands[i])
            if import_data:
                if containers[i] == "redis":
                    exec_and_time2("redis", r"""docker exec --privileged redis sh -c "cat /shared_data/mock_data_10000_set.csv | redis-cli --pipe" """, verbose=False, save_mem=True)
                else: 
                    exec_and_time(containers[i], data_import_commands[i], verbose=False, save_mem=True)
            if containers[i] == "neo4j":
                exec_and_time2("neo4j", r"""docker exec --privileged neo4j sh -c "echo \"MATCH (n:Customer) CREATE (n)-[r:Autoedge]->(n) RETURN r;\" | /var/lib/neo4j/bin/cypher-shell" """, verbose=False, save_mem=True)
            
            data_import_commands[i] = app1
            
            time_taken = exec_and_time2(containers[i], commands[i].replace("{{id}}", str(random.randrange(0, 9999, 1))), verbose=False, save_mem=True)
            times[i] += time_taken
            if peak_times[i] < time_taken:
                peak_times[i] = time_taken
        if j < iterations-1:
            _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose down', verbose=False)
    for i in range(0, len(containers)):
        times[i] = times[i]/iterations
        print(containers[i] + ": " + str(times[i]))
    return (times, peak_times)

Unindexed Search Commands

Redis has no unindexed search.


In [15]:
unindexed_read_commands = [
    r"""docker exec --privileged mongo mongo testing_db --eval "db.customers.find({first_name:'Hosea'}).pretty().shellPrint();" """,
    "",
    r"""docker exec --privileged neo4j sh -c "echo \"MATCH (n) WHERE n.firstName = 'Hosea' RETURN n;\" | /var/lib/neo4j/bin/cypher-shell" """,
    r"""docker exec --privileged orientdb /orientdb/bin/console.sh "connect plocal:/temp/databases/mock admin admin; select from Customer where first_name = 'Hosea' " """,
    r"""docker exec --privileged cassandra cqlsh -k mock_keyspace -e "select * from customers where first_name = 'Hosea' allow filtering;" """,
    r"""docker exec --privileged postgres sh -c "echo \"SELECT * FROM customers WHERE first_name = 'Hosea'\" | psql -U postgres mock" """, #postgres
    r"""docker exec --privileged mysql mysql -u root --password=password --database mock -e "SELECT * FROM customers where first_name = 'Hosea'" """,
    r"""docker exec --privileged memsql memsql-shell --database mock -e "SELECT * FROM customers where first_name = 'Hosea'" """,
    r"""docker exec --privileged voltdb sh -c "echo \"select * from customers where first_name = 'Hosea';\" | sqlcmd"  """,
    r"""docker exec --privileged cockroach ./cockroach sql --insecure --execute="SET DATABASE = mock; SELECT * FROM customers WHERE first_name = 'Hosea';" """ 
]

Iterating


In [16]:
(unindexed_read_times_avg, unindexed_read_times_peak) = average_time2(unindexed_read_commands, containers, num_of_runs)
unindexed_read_times_avg[1] = 0; # redis has no unindexed search


iteration 1
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 2
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 3
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 4
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 5
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 6
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 7
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 8
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 9
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 10
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
mongo: 0.7613518714904786
redis: 0.006602716445922851
neo4j: 1.3454859018325807
orientdb: 3.392029571533203
cassandra: 1.0725558996200562
postgres: 0.2971117734909058
mysql: 0.24711825847625732
memsql: 4.068214631080627
voltdb: 3.096517038345337
cockroach: 0.40518109798431395

Results


In [17]:
bar_plot(unindexed_read_times_avg, containers, colours, "Average search time")


Indexed Search Commands


In [18]:
indexed_read_commands = [
    r"""docker exec --privileged mongo mongo testing_db --eval "db.customers.find({Id:{{id}}}).pretty().shellPrint()" """,
    r"""docker exec --privileged redis sh -c "echo GET {{id}}} | redis-cli" """,
    r"""docker exec --privileged neo4j sh -c "echo \"MATCH (n) WHERE ID(n) = {{id}}} RETURN n;\" | /var/lib/neo4j/bin/cypher-shell" """,
    r"""docker exec --privileged orientdb /orientdb/bin/console.sh "connect plocal:/temp/databases/mock admin admin; select from Customer where Id = {{id}}}" """,
    r"""docker exec --privileged cassandra cqlsh -k mock_keyspace -e "select * from customers where id = {{id}}} allow filtering;" """,
    r"""docker exec --privileged postgres sh -c "echo \"SELECT * FROM customers where id = {{id}}}\" | psql -U postgres mock" """, #postgres
    r"""docker exec --privileged mysql mysql -u root --password=password --database mock -e "SELECT * FROM customers where id = {{id}}}" """,
    r"""docker exec --privileged memsql memsql-shell --database mock -e "SELECT * FROM customers where id = {{id}}}" """,
    r"""docker exec --privileged voltdb sh -c "echo \"SELECT * FROM customers where id = {{id}}}\" | sqlcmd"  """,
    r"""docker exec --privileged cockroach ./cockroach sql --insecure --execute="SET DATABASE = mock; SELECT * FROM customers where id = {{id}}};" """ 
]

Iterating


In [19]:
(indexed_read_times_avg, indexed_read_times_peak) = average_time2(indexed_read_commands, containers, num_of_runs)


iteration 1
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 2
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 3
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 4
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 5
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 6
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 7
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 8
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 9
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 10
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
mongo: 0.752523136138916
redis: 0.1922227621078491
neo4j: 1.271317219734192
orientdb: 2.6494291543960573
cassandra: 1.2643517971038818
postgres: 0.3526382684707642
mysql: 0.29781160354614256
memsql: 3.249052143096924
voltdb: 2.3724739074707033
cockroach: 0.39572198390960694

Results


In [20]:
bar_plot(indexed_read_times_avg, containers, colours, "Average indexed search time")


Aggregate Functions Commands


In [21]:
aggregate_read_commands = [
    r"""docker exec --privileged mongo mongo testing_db --eval 'db.customers.aggregate(    [      { $group: { "_id": "$first_name", "avg": { $avg: "$income" } } }    ] ).pretty().shellPrint();' """,
    r"""docker exec --privileged redis sh -c "echo zcount mylist -inf +inf | redis-cli" """,
    r"""docker exec --privileged neo4j sh -c "echo \"match (n:Customer) return avg(n.income);\" | /var/lib/neo4j/bin/cypher-shell" """,
    r"""docker exec --privileged orientdb /orientdb/bin/console.sh "connect plocal:/temp/databases/mock admin admin; select avg(income) FROM Customer" """,
    r"""docker exec --privileged cassandra cqlsh -k mock_keyspace -e "select avg(income) as avg_income from customers;" """,
    r"""docker exec --privileged postgres sh -c "echo \"select avg(income) as avg_income from customers\" | psql -U postgres mock" """, #postgres
    r"""docker exec --privileged mysql mysql -u root --password=password --database mock -e "select avg(income) as avg_income from customers" """,
    r"""docker exec --privileged memsql memsql-shell --database mock -e "select avg(income) as avg_income from customers" """,
    r"""docker exec --privileged voltdb sh -c "echo \"select avg(income) as avg_income from customers\" | sqlcmd"  """,
    r"""docker exec --privileged cockroach ./cockroach sql --insecure --execute="SET DATABASE = mock; select avg(income) as avg_income from customers;" """ 
]

Iterating


In [22]:
(aggregate_read_times_avg, aggregate_read_times_peak) = average_time2(aggregate_read_commands, containers, num_of_runs)


iteration 1
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 2
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 3
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 4
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 5
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 6
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 7
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 8
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 9
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 10
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
mongo: 0.9295998573303222
redis: 0.2599918842315674
neo4j: 1.2807871341705321
orientdb: 2.7451351165771483
cassandra: 1.0557847023010254
postgres: 0.459069561958313
mysql: 0.2607097864151001
memsql: 3.2221822500228883
voltdb: 2.556734490394592
cockroach: 0.4968269348144531

Results


In [23]:
bar_plot(aggregate_read_times_avg, containers, colours, "Average aggregation time")


Join Commands

MongoDB, Redis, OrientDB, Cassandra do not allow join operations.


In [24]:
join_commands = [
    "",
    "",
    r"""docker exec --privileged neo4j sh -c "echo \"MATCH (n:Customer)-[:Autoedge]->(m:Customer) WHERE n.firstName = m.firstName RETURN n, m;\" | /var/lib/neo4j/bin/cypher-shell" """,
    "",
    "",
    r"""docker exec --privileged postgres sh -c "psql -q -U postgres mock -c 'select * from customers a, customers b where a.first_name = b.first_name;' > join.txt" """,
    r"""docker exec --privileged mysql sh -c "mysql -u root --password=password --database mock -e 'select * from customers a, customers b where a.first_name = b.first_name;' > join.txt " """,
    r"""docker exec --privileged memsql sh -c "memsql-shell --database mock -e 'select * from customers a, customers b where a.first_name = b.first_name;' > join.txt" """,
    r"""docker exec --privileged voltdb sh -c "echo 'select * from customers a, customers b where a.first_name = b.first_name;' | sqlcmd > join.txt"  """,
    r"""docker exec --privileged cockroach sh -c "./cockroach sql --insecure --execute='SET DATABASE = mock; select * from customers a, customers b where a.first_name = b.first_name;' > join.txt" """ 
]

Iterating


In [25]:
(join_times_avg, join_times_peak) = average_time2(join_commands, containers, num_of_runs)
join_times_avg[0] = 0; # mongodb does not support relational join operation
join_times_avg[1] = 0; # redis does not support relational join operation
join_times_avg[3] = 0; # orientdb does not support relational join operation
join_times_avg[4] = 0; # cassandra does not support relational join operation


iteration 1
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 2
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 3
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 4
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 5
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 6
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 7
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 8
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 9
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
iteration 10
waiting...
mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
mongo: 0.006198668479919433
redis: 0.0060993671417236325
neo4j: 0.9849414587020874
orientdb: 0.00619959831237793
cassandra: 0.006815910339355469
postgres: 0.3296961307525635
mysql: 0.2343583345413208
memsql: 1.1511044263839723
voltdb: 1.995752787590027
cockroach: 0.4339426279067993

Results


In [26]:
bar_plot(join_times_avg, containers, colours, "Average join time")


Throughput


In [27]:
def read_throughput(commands, containers, iterations=1000, verbose=True, import_data=False):
    times = [0] * len(containers)
    ratios = [0] * len(containers)
    _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose up -d', verbose=False)
    time.sleep(25) 
    for i in range(0, len(containers)):
        #imports
        if verbose:
            print(containers[i])
            
        coackroach_ip = ""
        
        app1 = data_import_commands[i]
        app2 = commands[i]
        
        if containers[i] == "cockroach":
            ip = run_command('docker exec cockroach ip route', verbose=False)
            ip = ip[len(ip)-2].split(" ")
            ip = ip[len(ip)-2]
            cockroach_ip = ip
            data_import_commands[i][1] = data_import_commands[i][1].replace("{{cockroach_ip}}", cockroach_ip)
        if import_data:
            if containers[i] == "redis":
                exec_and_time2("redis", r"""docker exec --privileged redis sh -c "cat /shared_data/mock_data_10000_set.csv | redis-cli --pipe" """, verbose=False, save_mem=True)
            else: 
                exec_and_time(containers[i], data_import_commands[i], verbose=False, save_mem=True)
        if containers[i] == "neo4j":
            exec_and_time2("neo4j", r"""docker exec --privileged neo4j sh -c "echo \"MATCH (n:Customer) CREATE (n)-[r:Autoedge]->(n) RETURN r;\" | /var/lib/neo4j/bin/cypher-shell" """, verbose=False, save_mem=True)
        #queries
        start = time.time()
        for j in range(0, iterations):  
            _ = exec_and_time2(containers[i], commands[i].replace("{{id}}", str(random.randrange(0, 9999, 1))), verbose=False, save_mem=True)
        
        data_import_commands[i] = app1
        commands[i] = app2
        
        end = time.time()
        times[i] = end - start
        ratios[i] = iterations/times[i]
    _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker" && docker-compose down', verbose=False)
    for i in range(0, len(containers)):
        print(containers[i] + ": " + str(ratios[i]))
    return ratios

In [28]:
indexed_read_throughput = read_throughput(indexed_read_commands, containers, 50)


mongo
redis
neo4j
orientdb
cassandra
postgres
mysql
memsql
voltdb
cockroach
mongo: 30.883471280306168
redis: 9.963676313929312
neo4j: 1.4649809053195442
orientdb: 1.0592776115941447
cassandra: 30.48779406340856
postgres: 30.193238558103086
mysql: 30.88324842896359
memsql: 30.413624316141867
voltdb: 1.9990947054663746
cockroach: 7.378539119959243

Results


In [29]:
bar_plot(indexed_read_throughput, containers, colours, "Indexed read throughput")



In [30]:
write_throughput = [10000/x for x in import_times_avg]

bar_plot(write_throughput, containers, colours, "Write throughput")


Clusters

Bulk insert time and Throughput


In [31]:
def cluster_times(commands, dbms, index,iterations=1000, verbose=True, import_data=False):
    _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker\cluster\\' + dbms + '" && docker-compose up -d', verbose=True)
    if dbms == "mysql":
        _ = run_command("docker network connect bridge mysql")
    time.sleep(15) 
    #imports
    if verbose:
        print(dbms)

    coackroach_ip = ""

    app1 = data_import_commands[index]
    app2 = commands[index]

    if dbms == "cockroach":
        ip = run_command('docker exec cockroach ip route', verbose=False)
        ip = ip[len(ip)-2].split(" ")
        ip = ip[len(ip)-2]
        cockroach_ip = ip
        data_import_commands[index][1] = data_import_commands[index][1].replace("{{cockroach_ip}}", cockroach_ip)

    print("Importing data...")
    start_i = time.time()
    exec_and_time(dbms, data_import_commands[index], verbose=False, save_mem=True)
    import_time = time.time() - start_i
    #queries
    start = time.time()
    for j in range(0, iterations):  
        _ = exec_and_time2(containers[index], commands[index].replace("{{id}}", str(random.randrange(0, 9999, 1))), verbose=False, save_mem=True)

    data_import_commands[index] = app1
    commands[index] = app2

    end = time.time()
    time_taken = end - start
    ratio = iterations/time_taken
    
    _ = run_command('D: && cd "D:\GitHub\BigData\Project2\Docker\cluster\\' + dbms + '" && docker-compose down', verbose=False)
    
    print(dbms + ": " + str(ratio))
    return (import_time, ratio)

In [50]:
cluster_import_times = [0]*4
cluster_read_ratios = [0]*4

VoltDB


In [75]:
(cluster_import_times[0], cluster_read_ratios[0]) = cluster_times(indexed_read_commands, "voltdb", 8)


Creating network "voltdb_voltLocalCluster" with driver "bridge"
Creating voltdb
Creating voltdb3
Creating voltdb2
Creating voltdb4

voltdb
Importing data...
voltdb: 1.821068721661645

Cockroach


In [76]:
(cluster_import_times[1], cluster_read_ratios[1]) = cluster_times(indexed_read_commands, "cockroach", 9)


Creating cockroach
Creating roach3
Creating roach4
Creating roach2

cockroach
Importing data...
cockroach: 7.551633295931992

Cassandra


In [77]:
(cluster_import_times[2], cluster_read_ratios[2]) = cluster_times(indexed_read_commands, "cassandra", 4)


Creating cassandra
Creating cassandra3
Creating cassandra2
Creating cassandra4

cassandra
Importing data...
cassandra: 2.0460459761404914

MySql Cluster


In [72]:
(cluster_import_times[3], cluster_read_ratios[3]) = cluster_times(indexed_read_commands, "mysql", 6)


mysqlManagement1 is up-to-date
mysqlndb1 is up-to-date
mysqlndb2 is up-to-date
mysql is up-to-date
mysql_cluster is up-to-date

mysql
Importing data...
mysql: 1.579607448648723

Results


In [78]:
bar_plot(cluster_import_times, ["voltdb", "cockroach", "cassandra", "mysql cluster"], [colours[8], colours[9], colours[4], colours[6]], "Average bulk import time (cluster)")



In [79]:
bar_plot(cluster_read_ratios, ["voltdb", "cockroach", "cassandra", "mysql cluster"], [colours[8], colours[9], colours[4], colours[6]], "Average read throughput (cluster)")