Blaze - A Quick Tour

Blaze provides a lightweight interface on top of pre-existing computational infrastructure. This notebook gives a quick overview of how Blaze interacts with a variety of data types.



In [1]:

    
from blaze import Data, by, compute

Blaze wraps pre-existing data

Blaze interacts with normal Python objects. Operations on Blaze Data objects create expression trees.

These expressions deliver an intuitive numpy/pandas-like feel.



In [2]:

    
x = Data(1)
x









    Out[2]:




1



In [3]:

    
x.dshape









    Out[3]:





dshape("int64")



In [4]:

    
x + 1









    Out[4]:




2



In [5]:

    
print type(x + 1)
print type(compute(x + 1))









    



<class 'blaze.expr.arithmetic.Add'>
<type 'int'>

Lists

Starting small, Blaze interacts happily with collections of data.

It uses Pandas for pretty notebook printing.



In [6]:

    
x = Data([1, 2, 3, 4, 5])
x



In [7]:

    
x[x > 2] * 10



In [8]:

    
x.dshape









    Out[8]:





dshape("5 * int64")

Or Tabular, Pandas-like datasets

Slightly more exciting, Blaze operates on tabular data



In [9]:

    
L = [[1, 'Alice',   100],
     [2, 'Bob',    -200],
     [3, 'Charlie', 300],
     [4, 'Dennis',  400],
     [5, 'Edith',  -500]]



In [10]:

    
x = Data(L, fields=['id', 'name', 'amount'])
x.dshape









    Out[10]:





dshape("5 * {id: int64, name: string, amount: int64}")



In [11]:

    
x



In [12]:

    
deadbeats = x[x.amount < 0].name
deadbeats

Or it can even just drive pandas

Blaze doesn't do work, it just tells other systems to do work.

In the previous example, Blaze told Python which for-loops to write. In this example, it calls the right functions in Pandas.

The user experience is identical, only performance differs.



In [13]:

    
from pandas import DataFrame

df = DataFrame([[1, 'Alice',   100],                         
                [2, 'Bob',    -200],
                [3, 'Charlie', 300],
                [4, 'Denis',   400],
                [5, 'Edith',  -500]], columns=['id', 'name', 'amount'])



In [14]:

    
df



In [15]:

    
x = Data(df)
x



In [16]:

    
deadbeats = x[x.amount < 0].name
deadbeats

Calling compute, we see that Blaze returns a thing like what it was given.



In [17]:

    
type(compute(deadbeats))









    Out[17]:





pandas.core.series.Series

Other data types like SQLAlchemy Tables

Blaze extends beyond just Python and Pandas (that's the main motivation.)

Here it drives SQLAlchemy.



In [18]:

    
from sqlalchemy import Table, Column, MetaData, Integer, String, create_engine

tab = Table('bank', MetaData(),
            Column('id', Integer),
            Column('name', String),
            Column('amount', Integer))



In [19]:

    
x = Data(tab)
x.dshape









    Out[19]:





dshape("var * {id: ?int32, name: ?string, amount: ?int32}")

Just like computations on pandas objects produce pandas objects, computations on SQLAlchemy tables produce SQLAlchemy Select statements.



In [20]:

    
deadbeats = x[x.amount < 0].name
compute(deadbeats)









    Out[20]:





<sqlalchemy.sql.selectable.Select at 0x7f2543f2fc10; Select object>



In [21]:

    
print compute(deadbeats)  # SQLAlchemy generates actual SQL









    



SELECT bank.name 
FROM bank 
WHERE bank.amount < :amount_1

Connect to a real database

When we drive a SQLAlchemy table connected to a database we get actual computation.



In [22]:

    
engine = create_engine('sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db')



In [23]:

    
x = Data(engine)
x









    Out[23]:




Data:       Engine(sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db)
DataShape:  {
  iris: var * {
    sepal_length: ?float64,
    sepal_width: ?float64,
    petal_length: ?float64,
    petal_width: ?float64,
    species: ?string
  ...



In [24]:

    
x.iris









    Out[24]:





  
    
      
      sepal_length
      sepal_width
      petal_length
      petal_width
      species
    
  
  
    
      0 
       5.1
       3.5
       1.4
       0.2
       Iris-setosa
    
    
      1 
       4.9
       3.0
       1.4
       0.2
       Iris-setosa
    
    
      2 
       4.7
       3.2
       1.3
       0.2
       Iris-setosa
    
    
      3 
       4.6
       3.1
       1.5
       0.2
       Iris-setosa
    
    
      4 
       5.0
       3.6
       1.4
       0.2
       Iris-setosa
    
    
      5 
       5.4
       3.9
       1.7
       0.4
       Iris-setosa
    
    
      6 
       4.6
       3.4
       1.4
       0.3
       Iris-setosa
    
    
      7 
       5.0
       3.4
       1.5
       0.2
       Iris-setosa
    
    
      8 
       4.4
       2.9
       1.4
       0.2
       Iris-setosa
    
    
      9 
       4.9
       3.1
       1.5
       0.1
       Iris-setosa
    
    
      10
       5.4
       3.7
       1.5
       0.2
       Iris-setosa



In [25]:

    
by(x.iris.species, shortest=x.iris.sepal_length.min(), 
                    longest=x.iris.sepal_length.max())









    Out[25]:





  
    
      
      species
      longest
      shortest
    
  
  
    
      0
           Iris-setosa
       5.8
       4.3
    
    
      1
       Iris-versicolor
       7.0
       4.9
    
    
      2
        Iris-virginica
       7.9
       4.9

Use URI strings to ease access

Often just figuring out how to produce the relevant Python object can be a challenge.

Blaze supports many formats of URI strings



In [26]:

    
x = Data('sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db::iris')
x









    Out[26]:





  
    
      
      sepal_length
      sepal_width
      petal_length
      petal_width
      species
    
  
  
    
      0 
       5.1
       3.5
       1.4
       0.2
       Iris-setosa
    
    
      1 
       4.9
       3.0
       1.4
       0.2
       Iris-setosa
    
    
      2 
       4.7
       3.2
       1.3
       0.2
       Iris-setosa
    
    
      3 
       4.6
       3.1
       1.5
       0.2
       Iris-setosa
    
    
      4 
       5.0
       3.6
       1.4
       0.2
       Iris-setosa
    
    
      5 
       5.4
       3.9
       1.7
       0.4
       Iris-setosa
    
    
      6 
       4.6
       3.4
       1.4
       0.3
       Iris-setosa
    
    
      7 
       5.0
       3.4
       1.5
       0.2
       Iris-setosa
    
    
      8 
       4.4
       2.9
       1.4
       0.2
       Iris-setosa
    
    
      9 
       4.9
       3.1
       1.5
       0.1
       Iris-setosa
    
    
      10
       5.4
       3.7
       1.5
       0.2
       Iris-setosa

Once you have SQL, might as well go big



In [27]:

    
x = Data('impala://ec2-54-90-201-28.compute-1.amazonaws.com')

MongoDB

Github's database is mirrored in a Mongo collection hosted in the Netherlands.

Connecting via ssh tunnel. See http://ghtorrent.org/ to obtain access.



In [28]:

    
users = Data('mongodb://ghtorrentro:ghtorrentro@localhost/github::users')
users









    Out[28]:





  
    
      
      avatar_url
      bio
      blog
      company
      created_at
      email
      followers
      following
      gravatar_id
      hireable
      html_url
      id
      location
      login
      name
      public_gists
      public_repos
      type
      url
    
  
  
    
      0 
       https://secure.gravatar.com/avatar/a7e55f31bb4...
                                                    None
                                               None
                       None
       2012-05-04T13:59:54Z
                           None
         0
         0
       a7e55f31bb45321f30211e901cd89ffa
        None
       https://github.com/Michaelwussler
       1706010
                     None
       Michaelwussler
                       None
         0
         3
       User
       https://api.github.com/users/Michaelwussler
    
    
      1 
       https://secure.gravatar.com/avatar/eb8139078bc...
                                                    None
                                               None
                       None
       2012-05-03T18:47:13Z
                           None
         0
         0
       eb8139078bc623dee103ed3917c080dc
        None
              https://github.com/praiser
       1703505
                     None
              praiser
                       None
         0
         3
       User
              https://api.github.com/users/praiser
    
    
      2 
       https://secure.gravatar.com/avatar/13c7b665e0c...
                                                    None
                                                   
                           
       2010-04-07T12:15:00Z
           vad.viktor@gmail.com
         2
         3
       13c7b665e0cbd94e0155387c35957d13
       False
            https://github.com/vadviktor
        238703
                 Budapest
            vadviktor
                 Vad Viktor
         0
        10
       User
            https://api.github.com/users/vadviktor
    
    
      3 
       https://secure.gravatar.com/avatar/b7937805411...
                                                        
                                               None
               Appcelerator
       2012-04-02T16:13:58Z
          yjin@appcelerator.com
         0
         0
       b7937805411d278ceb839175e251e2a0
       False
                https://github.com/ypjin
       1598831
                  Beijing
                ypjin
                     Yuping
         0
         5
       User
                https://api.github.com/users/ypjin
    
    
      4 
       https://secure.gravatar.com/avatar/89e109fca84...
                                                        
        http://blogs.perl.org/users/steven_haryanto
                          -
       2010-02-26T01:28:09Z
       stevenharyanto@gmail.com
        39
       307
       89e109fca8474e5636c9feef7a8422ea
       False
            https://github.com/sharyanto
        211084
       Jakarta, Indonesia
            sharyanto
            Steven Haryanto
         5
       195
       User
            https://api.github.com/users/sharyanto
    
    
      5 
       https://secure.gravatar.com/avatar/7490b4e3e9c...
       Perl, C, C++, JavaScript, PHP, Haskell, Ruby, ...
                                      http://c9s.me
                           
       2009-02-01T15:20:08Z
       cornelius.howl@gmail.com
       330
       599
       7490b4e3e9cb85a1f7dc0c8ea01a86e5
        True
                  https://github.com/c9s
         50894
           Taipei, Taiwan
                  c9s
                  Yo-An Lin
       281
       206
       User
                  https://api.github.com/users/c9s
    
    
      6 
       https://secure.gravatar.com/avatar/dc078ac4dbd...
                                                    None
                                  azhari.harahap.us
               CapungRiders
       2010-10-31T05:53:40Z
              azhari@harahap.us
        26
        11
       dc078ac4dbdc06d3e3c0ec0b6801b53d
       False
            https://github.com/back2arie
        461397
                Indonesia
            back2arie
             Azhari Harahap
         1
        15
       User
            https://api.github.com/users/back2arie
    
    
      7 
       https://secure.gravatar.com/avatar/fb844ffed6c...
       Git Ninja and language-agnostic problem solver...
                                 http://dukeleto.pl
              Leto Labs LLC
       2008-10-22T03:02:15Z
              jonathan@leto.net
       175
       635
       fb844ffed6c5a2e69638627e3b721308
        True
                 https://github.com/leto
         30298
             Portland, OR
                 leto
       Jonathan "Duke" Leto
       276
       112
       User
                 https://api.github.com/users/leto
    
    
      8 
       https://secure.gravatar.com/avatar/3843ec7861e...
                                                        
                             http://alanhaggai.org/
            Thought Ripples
       2009-01-13T16:25:15Z
                haggai@cpan.org
        46
       365
       3843ec7861e271e803ea076035d683dd
       False
           https://github.com/alanhaggai
         46288
                       IN
           alanhaggai
          Alan Haggai Alavi
         4
        54
       User
           https://api.github.com/users/alanhaggai
    
    
      9 
       https://secure.gravatar.com/avatar/f611628c558...
                                                    None
                                     arisdottle.net
       Team Rooster Pirates
       2009-05-12T19:29:09Z
       amiri@roosterpirates.com
        16
        87
       f611628c5588f7a0a72c65ec1f94dfb8
       False
                https://github.com/amiri
         83806
          Los Angeles, CA
                amiri
            Amiri Barksdale
        16
        18
       User
                https://api.github.com/users/amiri
    
    
      10
       https://secure.gravatar.com/avatar/c57483c5cfe...
                                                    None
       http://www.geekfarm.org/wu/muse/WebHome.html
                       None
       2009-02-08T03:28:54Z
             git-c@geekfarm.org
        16
        87
       c57483c5cfe159b98a6e33ee7e9eec38
       False
                   https://github.com/wu
         52700
                     None
                   wu
                 Alex White
         0
        15
       User
                   https://api.github.com/users/wu

Handle NumPy-like computations



In [29]:

    
import h5py
f = h5py.File('/home/mrocklin/Downloads/OMI-Aura_L2-OMAERO_2014m1105t2304-o54838_v003-2014m1106t215558.he5')



In [30]:

    
x = Data(f)
x.dshape









    Out[30]:





dshape("""{
  HDFEOS: {
    ADDITIONAL: {FILE_ATTRIBUTES: {}},
    SWATHS: {
      ColumnAmountAerosol: {
        Data Fields: {
          AerosolIndexUV: 1643 * 60 * int16,
          AerosolIndexVIS: 1643 * 60 * int16,
          AerosolModelMW: 1643 * 60 * uint16,
          AerosolModelsPassedThreshold: 1643 * 60 * 10 * uint16,
          AerosolOpticalThicknessMW: 1643 * 60 * 14 * int16,
          AerosolOpticalThicknessMWPrecision: 1643 * 60 * int16,
          AerosolOpticalThicknessNUV: 1643 * 60 * 2 * int16,
          AerosolOpticalThicknessPassedThreshold: 1643 * 60 * 10 * 9 * int16,
          AerosolOpticalThicknessPassedThresholdMean: 1643 * 60 * 9 * int16,
          AerosolOpticalThicknessPassedThresholdStd: 1643 * 60 * 9 * int16,
          CloudFlags: 1643 * 60 * uint8,
          CloudPressure: 1643 * 60 * int16,
          EffectiveCloudFraction: 1643 * 60 * int8,
          InstrumentConfigurationId: 1643 * uint8,
          MeasurementQualityFlags: 1643 * uint8,
          NumberOfModelsPassedThreshold: 1643 * 60 * uint8,
          ProcessingQualityFlagsMW: 1643 * 60 * uint16,
          ProcessingQualityFlagsNUV: 1643 * 60 * uint16,
          RootMeanSquareErrorOfFitPassedThreshold: 1643 * 60 * 10 * int16,
          SingleScatteringAlbedoMW: 1643 * 60 * 14 * int16,
          SingleScatteringAlbedoMWPrecision: 1643 * 60 * int16,
          SingleScatteringAlbedoNUV: 1643 * 60 * 2 * int16,
          SingleScatteringAlbedoPassedThreshold: 1643 * 60 * 10 * 9 * int16,
          SingleScatteringAlbedoPassedThresholdMean: 1643 * 60 * 9 * int16,
          SingleScatteringAlbedoPassedThresholdStd: 1643 * 60 * 9 * int16,
          SmallPixelRadiancePointerUV: 1643 * 2 * int16,
          SmallPixelRadiancePointerVIS: 1643 * 2 * int16,
          SmallPixelRadianceUV: 6783 * 60 * float32,
          SmallPixelRadianceVIS: 6786 * 60 * float32,
          SmallPixelWavelengthUV: 6783 * 60 * uint16,
          SmallPixelWavelengthVIS: 6786 * 60 * uint16,
          TerrainPressure: 1643 * 60 * int16,
          TerrainReflectivity: 1643 * 60 * 9 * int16,
          XTrackQualityFlags: 1643 * 60 * uint8
          },
        Geolocation Fields: {
          GroundPixelQualityFlags: 1643 * 60 * uint16,
          Latitude: 1643 * 60 * float32,
          Longitude: 1643 * 60 * float32,
          OrbitPhase: 1643 * float32,
          SolarAzimuthAngle: 1643 * 60 * float32,
          SolarZenithAngle: 1643 * 60 * float32,
          SpacecraftAltitude: 1643 * float32,
          SpacecraftLatitude: 1643 * float32,
          SpacecraftLongitude: 1643 * float32,
          TerrainHeight: 1643 * 60 * int16,
          Time: 1643 * float64,
          ViewingAzimuthAngle: 1643 * 60 * float32,
          ViewingZenithAngle: 1643 * 60 * float32
          }
        }
      }
    },
  HDFEOS INFORMATION: {
    ArchiveMetadata.0: string[65535, 'A'],
    CoreMetadata.0: string[65535, 'A'],
    StructMetadata.0: string[32000, 'A']
    }
  }""")



In [31]:

    
x.HDFEOS.SWATHS.ColumnAmountAerosol.Data_Fields.CloudPressure









    Out[31]:




array([[-32767, -32767, -32767, ..., -32767, -32767, -32767],
       [-32767, -32767, -32767, ..., -32767, -32767, -32767],
       [-32767, -32767, -32767, ..., -32767, -32767, -32767],
       ..., 
       [-32767, -32767, -32767, ..., -32767, -32767, -32767],
       [-32767, -32767, -32767, ..., -32767, -32767, -32767],
       [-32767, -32767, -32767, ..., -32767, -32767, -32767]], dtype=int16)



In [32]:

    
x.HDFEOS.SWATHS.ColumnAmountAerosol.Data_Fields.CloudPressure.max()









    Out[32]:




1013

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa
5	5.4	3.9	1.7	0.4	Iris-setosa
6	4.6	3.4	1.4	0.3	Iris-setosa
7	5.0	3.4	1.5	0.2	Iris-setosa
8	4.4	2.9	1.4	0.2	Iris-setosa
9	4.9	3.1	1.5	0.1	Iris-setosa
10	5.4	3.7	1.5	0.2	Iris-setosa

	species	longest	shortest
0	Iris-setosa	5.8	4.3
1	Iris-versicolor	7.0	4.9
2	Iris-virginica	7.9	4.9

	avatar_url	bio	blog	company	created_at	email	followers	following	gravatar_id	hireable	html_url	id	location	login	name	public_gists	public_repos	type	url
0	https://secure.gravatar.com/avatar/a7e55f31bb4...	None	None	None	2012-05-04T13:59:54Z	None	0	0	a7e55f31bb45321f30211e901cd89ffa	None	https://github.com/Michaelwussler	1706010	None	Michaelwussler	None	0	3	User	https://api.github.com/users/Michaelwussler
1	https://secure.gravatar.com/avatar/eb8139078bc...	None	None	None	2012-05-03T18:47:13Z	None	0	0	eb8139078bc623dee103ed3917c080dc	None	https://github.com/praiser	1703505	None	praiser	None	0	3	User	https://api.github.com/users/praiser
2	https://secure.gravatar.com/avatar/13c7b665e0c...	None			2010-04-07T12:15:00Z	vad.viktor@gmail.com	2	3	13c7b665e0cbd94e0155387c35957d13	False	https://github.com/vadviktor	238703	Budapest	vadviktor	Vad Viktor	0	10	User	https://api.github.com/users/vadviktor
3	https://secure.gravatar.com/avatar/b7937805411...		None	Appcelerator	2012-04-02T16:13:58Z	yjin@appcelerator.com	0	0	b7937805411d278ceb839175e251e2a0	False	https://github.com/ypjin	1598831	Beijing	ypjin	Yuping	0	5	User	https://api.github.com/users/ypjin
4	https://secure.gravatar.com/avatar/89e109fca84...		http://blogs.perl.org/users/steven_haryanto	-	2010-02-26T01:28:09Z	stevenharyanto@gmail.com	39	307	89e109fca8474e5636c9feef7a8422ea	False	https://github.com/sharyanto	211084	Jakarta, Indonesia	sharyanto	Steven Haryanto	5	195	User	https://api.github.com/users/sharyanto
5	https://secure.gravatar.com/avatar/7490b4e3e9c...	Perl, C, C++, JavaScript, PHP, Haskell, Ruby, ...	http://c9s.me		2009-02-01T15:20:08Z	cornelius.howl@gmail.com	330	599	7490b4e3e9cb85a1f7dc0c8ea01a86e5	True	https://github.com/c9s	50894	Taipei, Taiwan	c9s	Yo-An Lin	281	206	User	https://api.github.com/users/c9s
6	https://secure.gravatar.com/avatar/dc078ac4dbd...	None	azhari.harahap.us	CapungRiders	2010-10-31T05:53:40Z	azhari@harahap.us	26	11	dc078ac4dbdc06d3e3c0ec0b6801b53d	False	https://github.com/back2arie	461397	Indonesia	back2arie	Azhari Harahap	1	15	User	https://api.github.com/users/back2arie
7	https://secure.gravatar.com/avatar/fb844ffed6c...	Git Ninja and language-agnostic problem solver...	http://dukeleto.pl	Leto Labs LLC	2008-10-22T03:02:15Z	jonathan@leto.net	175	635	fb844ffed6c5a2e69638627e3b721308	True	https://github.com/leto	30298	Portland, OR	leto	Jonathan "Duke" Leto	276	112	User	https://api.github.com/users/leto
8	https://secure.gravatar.com/avatar/3843ec7861e...		http://alanhaggai.org/	Thought Ripples	2009-01-13T16:25:15Z	haggai@cpan.org	46	365	3843ec7861e271e803ea076035d683dd	False	https://github.com/alanhaggai	46288	IN	alanhaggai	Alan Haggai Alavi	4	54	User	https://api.github.com/users/alanhaggai
9	https://secure.gravatar.com/avatar/f611628c558...	None	arisdottle.net	Team Rooster Pirates	2009-05-12T19:29:09Z	amiri@roosterpirates.com	16	87	f611628c5588f7a0a72c65ec1f94dfb8	False	https://github.com/amiri	83806	Los Angeles, CA	amiri	Amiri Barksdale	16	18	User	https://api.github.com/users/amiri
10	https://secure.gravatar.com/avatar/c57483c5cfe...	None	http://www.geekfarm.org/wu/muse/WebHome.html	None	2009-02-08T03:28:54Z	git-c@geekfarm.org	16	87	c57483c5cfe159b98a6e33ee7e9eec38	False	https://github.com/wu	52700	None	wu	Alex White	0	15	User	https://api.github.com/users/wu