The Blaze Ecosystem

http://blaze.pydata.org/

The Blaze ecosystem is a set of libraries that help users store, describe, query and process data. It is composed of the following core projects:

Blaze: An interface to query data on different storage systems
Dask: Parallel computing through task scheduling and blocked algorithms
Datashape: A data description language
DyND: A C++ library for dynamic, multidimensional arrays
Odo: Data migration between different storage systems

Github: https://github.com/blaze/blaze.git

Blog:

Balze: http://blaze.pydata.org/pages/talks/ep2015-blaze/
Github Analyze with Dask: http://blaze.pydata.org/blog/2016/02/17/dask-distributed-1/

%%! cd .. source activate GISpark git clone https://github.com/blaze/blaze.git cd blaze python setup.py install



In [43]:

    
from blaze import *
#from blaze import compute
#from blaze import data
#from blaze.utils import example
from blaze import examples



In [44]:

    
help(blaze)
#help(examples)









    



Help on package blaze:

NAME
    blaze

PACKAGE CONTENTS
    _version
    cached
    compatibility
    compute (package)
    dispatch
    expr (package)
    index
    interactive
    mongo
    partition
    server (package)
    sql
    tests (package)
    utils

SUBMODULES
    datetime
    examples

DATA
    Broadcastable = (<class 'blaze.expr.arithmetic.BinOp'>, <class 'blaze....
    Cheap = (<class 'blaze.expr.collections.Head'>, <class 'blaze.expr.exp...
    Sequence = (<class 'tuple'>, <class 'list'>, <class 'collections.abc.I...
    abs = <dispatched abs>
    absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...
    acos = <dispatched acos>
    acosh = <dispatched acosh>
    all = <dispatched all>
    all_formats = frozenset({SerializationFormat(name='json', loads=<funct...
    any = <dispatched any>
    api = <flask.blueprints.Blueprint object>
    append = <dispatched append>
    asin = <dispatched asin>
    asinh = <dispatched asinh>
    atan = <dispatched atan>
    atan2 = <dispatched atan2>
    atanh = <dispatched atanh>
    by = <dispatched by>
    ceil = <dispatched ceil>
    compute = <dispatched compute>
    compute_down = <dispatched compute_down>
    compute_up = <dispatched compute_up>
    convert = <odo.core.NetworkDispatcher object>
    copysign = <dispatched copysign>
    cos = <dispatched cos>
    cosh = <dispatched cosh>
    create_index = <dispatched create_index>
    degrees = <dispatched degrees>
    discover = <dispatched discover>
    dispatch = functools.partial(<function dispatch at 0x7fdaea...atched r...
    division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192...
    drop = <dispatched drop>
    exp = <dispatched exp>
    expm1 = <dispatched expm1>
    floor = <dispatched floor>
    fmod = <dispatched fmod>
    greatest = <dispatched greatest>
    hypot = <dispatched hypot>
    i = 4
    inf = inf
    into = <dispatched into>
    isnan = <dispatched isnan>
    json_format = SerializationFormat(name='json', loads=<function...x7fdb...
    ldexp = <dispatched ldexp>
    least = <dispatched least>
    log = <dispatched log>
    log10 = <dispatched log10>
    log1p = <dispatched log1p>
    max = <dispatched max>
    mean = <dispatched mean>
    min = <dispatched min>
    msgpack_format = SerializationFormat(name='msgpack', loads=functo...x7...
    nan = nan
    optimize = <dispatched optimize>
    pickle_format = SerializationFormat(name='pickle', loads=<built-...s.p...
    pre_compute = <dispatched pre_compute>
    print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...
    radians = <dispatched radians>
    resource = <odo.regex.RegexDispatcher object>
    rowfunc = <dispatched rowfunc>
    shape = <dispatched shape>
    sin = <dispatched sin>
    sinh = <dispatched sinh>
    sqrt = <dispatched sqrt>
    std = <dispatched std>
    sum = <dispatched sum>
    tan = <dispatched tan>
    tanh = <dispatched tanh>
    to_html = <dispatched to_html>
    trunc = <dispatched trunc>
    var = <dispatched var>

VERSION
    0.9.1

FILE
    /home/supermap/anaconda3/envs/GISpark/lib/python3.5/site-packages/blaze/__init__.py



In [59]:

    
js = JSON(example('accounts.json'))
s = symbol('s', discover(js))
compute(s.count(), js)

#jss = JSONLines(example('accounts-streaming.json'))
#compute(s.count(), jss)









    Out[59]:





5



In [56]:

    
t = Data([(1, 'Alice', 100),
        (2, 'Bob', -200),
        (3, 'Charlie', 300),
        (4, 'Denis', 400),
        (5, 'Edith', -500)],
        fields=['id', 'name', 'balance'])
#help(t)
t



In [55]:

    
iris = Data(example('iris.csv'))
iris









    Out[55]:





  
    
      
      sepal_length
      sepal_width
      petal_length
      petal_width
      species
    
  
  
    
      0
      5.1
      3.5
      1.4
      0.2
      Iris-setosa
    
    
      1
      4.9
      3.0
      1.4
      0.2
      Iris-setosa
    
    
      2
      4.7
      3.2
      1.3
      0.2
      Iris-setosa
    
    
      3
      4.6
      3.1
      1.5
      0.2
      Iris-setosa
    
    
      4
      5.0
      3.6
      1.4
      0.2
      Iris-setosa
    
    
      5
      5.4
      3.9
      1.7
      0.4
      Iris-setosa
    
    
      6
      4.6
      3.4
      1.4
      0.3
      Iris-setosa
    
    
      7
      5.0
      3.4
      1.5
      0.2
      Iris-setosa
    
    
      8
      4.4
      2.9
      1.4
      0.2
      Iris-setosa
    
    
      9
      4.9
      3.1
      1.5
      0.1
      Iris-setosa
    
    
      10
      5.4
      3.7
      1.5
      0.2
      Iris-setosa



In [ ]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa
5	5.4	3.9	1.7	0.4	Iris-setosa
6	4.6	3.4	1.4	0.3	Iris-setosa
7	5.0	3.4	1.5	0.2	Iris-setosa
8	4.4	2.9	1.4	0.2	Iris-setosa
9	4.9	3.1	1.5	0.1	Iris-setosa
10	5.4	3.7	1.5	0.2	Iris-setosa