Blaze Expressions

Blaze expressions convey intent from the user.

Blaze compute functions interpret from expressions to backend.

The interface in this notebook is not intended for interactive use. You may find interactive expressions (like Data) more useful for data exploration.



In [1]:

    
from blaze import symbol, compute, join



In [2]:

    
bank = symbol('bank', '''1000 * {id: int, 
                                 name: string, 
                                 balance: int,
                                 lastseen: datetime}''')
bank  # no data to see here









    



/home/mrocklin/Software/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:239: FormatterWarning: Exception in text/html formatter: Expression does not contain data resources
  FormatterWarning,






    Out[2]:





bank



In [3]:

    
deadbeats = bank[bank.balance < 0][['name', 'lastseen']]
deadbeats









    Out[3]:





bank[bank.balance < 0][['name', 'lastseen']]



In [4]:

    
deadbeats.dshape









    Out[4]:





dshape("var * {name: string, lastseen: datetime}")

Compute recipes

Blaze interprets expressions against backends by consulting a repository of small recipes.

We look at some simple recipes for Python, Pandas, and Spark



In [5]:

    
L = [[1, 'Alice',   100],
     [2, 'Bob',    -200],
     [3, 'Charlie', 300],
     [4, 'Dennis',  400],
     [5, 'Edith',  -500]]

from pandas import DataFrame

df = DataFrame([[1, 'Alice',   100],                         
                [2, 'Bob',    -200],
                [3, 'Charlie', 300],
                [4, 'Denis',   400],
                [5, 'Edith',  -500]], columns=['id', 'name', 'balance'])

import pyspark

sc = pyspark.SparkContext('local', 'blaze-app')
rdd = sc.parallelize(L)

bank = symbol('bank', '''1000 * {id: int, 
                                 name: string, 
                                 balance: int}''')

deadbeats = bank[bank.balance < 0].name



In [6]:

    
compute(deadbeats, L)









    Out[6]:





<itertools.chain at 0x7f1c47d14dd0>



In [7]:

    
compute(deadbeats, df)









    Out[7]:





1      Bob
4    Edith
Name: name, dtype: object



In [8]:

    
compute(deadbeats, rdd)









    Out[8]:





PythonRDD[1] at RDD at PythonRDD.scala:43

How Blaze handles numeric evaluation

or, how to stay sane while trying to engage the entire Python ecosystem



In [9]:

    
from blaze.compute.core import compute_up



In [10]:

    
compute_up.source(bank.head(), df)









    



File: /home/mrocklin/workspace/blaze/blaze/compute/pandas.py

@dispatch(Head, (Series, DataFrame))
def compute_up(t, df, **kwargs):
    return df.head(t.n)



In [11]:

    
compute_up.source(bank.head(), L)









    



File: /home/mrocklin/workspace/blaze/blaze/compute/python.py

@dispatch(Head, Sequence)
def compute_up(t, seq, **kwargs):
    if t.n < 100:
        return tuple(take(t.n, seq))
    else:
        return take(t.n, seq)



In [12]:

    
compute_up.source(bank.head(), rdd)









    



File: /home/mrocklin/workspace/blaze/blaze/compute/spark.py

@dispatch(Head, RDD)
def compute_up(t, rdd, **kwargs):
    return rdd.take(t.n)

N-Dimensional example



In [13]:

    
x = symbol('x', '1000 * 1000 * {measurement: float32, timestamp: datetime}')
x









    Out[13]:




x



In [14]:

    
expr = x[:500].measurement.sum(axis=1)
expr









    Out[14]:





sum(x[:500].measurement, axis=(1,))



In [15]:

    
expr.dshape









    Out[15]:





dshape("500 * float32")