The Blaze Ecosystem

http://blaze.pydata.org/

The Blaze ecosystem is a set of libraries that help users store, describe, query and process data. It is composed of the following core projects:

  • Blaze: An interface to query data on different storage systems
  • Dask: Parallel computing through task scheduling and blocked algorithms
  • Datashape: A data description language
  • DyND: A C++ library for dynamic, multidimensional arrays
  • Odo: Data migration between different storage systems

Github: https://github.com/blaze/blaze.git

Blog:

%%! cd .. source activate GISpark git clone https://github.com/blaze/blaze.git cd blaze python setup.py install

In [43]:
from blaze import *
#from blaze import compute
#from blaze import data
#from blaze.utils import example
from blaze import examples

In [44]:
help(blaze)
#help(examples)


Help on package blaze:

NAME
    blaze

PACKAGE CONTENTS
    _version
    cached
    compatibility
    compute (package)
    dispatch
    expr (package)
    index
    interactive
    mongo
    partition
    server (package)
    sql
    tests (package)
    utils

SUBMODULES
    datetime
    examples

DATA
    Broadcastable = (<class 'blaze.expr.arithmetic.BinOp'>, <class 'blaze....
    Cheap = (<class 'blaze.expr.collections.Head'>, <class 'blaze.expr.exp...
    Sequence = (<class 'tuple'>, <class 'list'>, <class 'collections.abc.I...
    abs = <dispatched abs>
    absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...
    acos = <dispatched acos>
    acosh = <dispatched acosh>
    all = <dispatched all>
    all_formats = frozenset({SerializationFormat(name='json', loads=<funct...
    any = <dispatched any>
    api = <flask.blueprints.Blueprint object>
    append = <dispatched append>
    asin = <dispatched asin>
    asinh = <dispatched asinh>
    atan = <dispatched atan>
    atan2 = <dispatched atan2>
    atanh = <dispatched atanh>
    by = <dispatched by>
    ceil = <dispatched ceil>
    compute = <dispatched compute>
    compute_down = <dispatched compute_down>
    compute_up = <dispatched compute_up>
    convert = <odo.core.NetworkDispatcher object>
    copysign = <dispatched copysign>
    cos = <dispatched cos>
    cosh = <dispatched cosh>
    create_index = <dispatched create_index>
    degrees = <dispatched degrees>
    discover = <dispatched discover>
    dispatch = functools.partial(<function dispatch at 0x7fdaea...atched r...
    division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192...
    drop = <dispatched drop>
    exp = <dispatched exp>
    expm1 = <dispatched expm1>
    floor = <dispatched floor>
    fmod = <dispatched fmod>
    greatest = <dispatched greatest>
    hypot = <dispatched hypot>
    i = 4
    inf = inf
    into = <dispatched into>
    isnan = <dispatched isnan>
    json_format = SerializationFormat(name='json', loads=<function...x7fdb...
    ldexp = <dispatched ldexp>
    least = <dispatched least>
    log = <dispatched log>
    log10 = <dispatched log10>
    log1p = <dispatched log1p>
    max = <dispatched max>
    mean = <dispatched mean>
    min = <dispatched min>
    msgpack_format = SerializationFormat(name='msgpack', loads=functo...x7...
    nan = nan
    optimize = <dispatched optimize>
    pickle_format = SerializationFormat(name='pickle', loads=<built-...s.p...
    pre_compute = <dispatched pre_compute>
    print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...
    radians = <dispatched radians>
    resource = <odo.regex.RegexDispatcher object>
    rowfunc = <dispatched rowfunc>
    shape = <dispatched shape>
    sin = <dispatched sin>
    sinh = <dispatched sinh>
    sqrt = <dispatched sqrt>
    std = <dispatched std>
    sum = <dispatched sum>
    tan = <dispatched tan>
    tanh = <dispatched tanh>
    to_html = <dispatched to_html>
    trunc = <dispatched trunc>
    var = <dispatched var>

VERSION
    0.9.1

FILE
    /home/supermap/anaconda3/envs/GISpark/lib/python3.5/site-packages/blaze/__init__.py



In [59]:
js = JSON(example('accounts.json'))
s = symbol('s', discover(js))
compute(s.count(), js)

#jss = JSONLines(example('accounts-streaming.json'))
#compute(s.count(), jss)


Out[59]:
5

In [56]:
t = Data([(1, 'Alice', 100),
        (2, 'Bob', -200),
        (3, 'Charlie', 300),
        (4, 'Denis', 400),
        (5, 'Edith', -500)],
        fields=['id', 'name', 'balance'])
#help(t)
t


Out[56]:
id name balance
0 1 Alice 100
1 2 Bob -200
2 3 Charlie 300
3 4 Denis 400
4 5 Edith -500

In [55]:
iris = Data(example('iris.csv'))
iris


Out[55]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa

In [ ]: