Blaze, MongoDB, and Github Data

The website http://ghtorrent.org/ maintains a mirror of GitHub's public data in a large Mongo Database.

We access and query this database with Blaze.


In [1]:
import blaze
from blaze import Table, into
blaze.__version__


Out[1]:
'0.6.5'

We authenticate by tunneling into the server. We previously sent them an ssh key.

ssh -L 27017:dutihr.st.ewi.tudelft.nl:27017 ghtorrent@dutihr.st.ewi.tudelft.nl

In [2]:
users = Table('mongodb://ghtorrentro:ghtorrentro@localhost/github::users')
users


Out[2]:
avatar_url bio blog company created_at email followers following gravatar_id hireable html_url id location login name public_gists public_repos type url
0 https://secure.gravatar.com/avatar/a7e55f31bb4... None None None 2012-05-04T13:59:54Z None 0 0 a7e55f31bb45321f30211e901cd89ffa None https://github.com/Michaelwussler 1706010 None Michaelwussler None 0 3 User https://api.github.com/users/Michaelwussler
1 https://secure.gravatar.com/avatar/eb8139078bc... None None None 2012-05-03T18:47:13Z None 0 0 eb8139078bc623dee103ed3917c080dc None https://github.com/praiser 1703505 None praiser None 0 3 User https://api.github.com/users/praiser
2 https://secure.gravatar.com/avatar/13c7b665e0c... None 2010-04-07T12:15:00Z vad.viktor@gmail.com 2 3 13c7b665e0cbd94e0155387c35957d13 False https://github.com/vadviktor 238703 Budapest vadviktor Vad Viktor 0 10 User https://api.github.com/users/vadviktor
3 https://secure.gravatar.com/avatar/b7937805411... None Appcelerator 2012-04-02T16:13:58Z yjin@appcelerator.com 0 0 b7937805411d278ceb839175e251e2a0 False https://github.com/ypjin 1598831 Beijing ypjin Yuping 0 5 User https://api.github.com/users/ypjin
4 https://secure.gravatar.com/avatar/89e109fca84... http://blogs.perl.org/users/steven_haryanto - 2010-02-26T01:28:09Z stevenharyanto@gmail.com 39 307 89e109fca8474e5636c9feef7a8422ea False https://github.com/sharyanto 211084 Jakarta, Indonesia sharyanto Steven Haryanto 5 195 User https://api.github.com/users/sharyanto
5 https://secure.gravatar.com/avatar/7490b4e3e9c... Perl, C, C++, JavaScript, PHP, Haskell, Ruby, ... http://c9s.me 2009-02-01T15:20:08Z cornelius.howl@gmail.com 330 599 7490b4e3e9cb85a1f7dc0c8ea01a86e5 True https://github.com/c9s 50894 Taipei, Taiwan c9s Yo-An Lin 281 206 User https://api.github.com/users/c9s
6 https://secure.gravatar.com/avatar/dc078ac4dbd... None azhari.harahap.us CapungRiders 2010-10-31T05:53:40Z azhari@harahap.us 26 11 dc078ac4dbdc06d3e3c0ec0b6801b53d False https://github.com/back2arie 461397 Indonesia back2arie Azhari Harahap 1 15 User https://api.github.com/users/back2arie
7 https://secure.gravatar.com/avatar/fb844ffed6c... Git Ninja and language-agnostic problem solver... http://dukeleto.pl Leto Labs LLC 2008-10-22T03:02:15Z jonathan@leto.net 175 635 fb844ffed6c5a2e69638627e3b721308 True https://github.com/leto 30298 Portland, OR leto Jonathan "Duke" Leto 276 112 User https://api.github.com/users/leto
8 https://secure.gravatar.com/avatar/3843ec7861e... http://alanhaggai.org/ Thought Ripples 2009-01-13T16:25:15Z haggai@cpan.org 46 365 3843ec7861e271e803ea076035d683dd False https://github.com/alanhaggai 46288 IN alanhaggai Alan Haggai Alavi 4 54 User https://api.github.com/users/alanhaggai
9 https://secure.gravatar.com/avatar/f611628c558... None arisdottle.net Team Rooster Pirates 2009-05-12T19:29:09Z amiri@roosterpirates.com 16 87 f611628c5588f7a0a72c65ec1f94dfb8 False https://github.com/amiri 83806 Los Angeles, CA amiri Amiri Barksdale 16 18 User https://api.github.com/users/amiri
10 https://secure.gravatar.com/avatar/c57483c5cfe... None http://www.geekfarm.org/wu/muse/WebHome.html None 2009-02-08T03:28:54Z git-c@geekfarm.org 16 87 c57483c5cfe159b98a6e33ee7e9eec38 False https://github.com/wu 52700 None wu Alex White 0 15 User https://api.github.com/users/wu

It feels interactive

Because by default we ask only for ten elements the remote database can return and communicate results quickly


In [3]:
users.company


Out[3]:
company
0 None
1 None
2
3 Appcelerator
4 -
5
6 CapungRiders
7 Leto Labs LLC
8 Thought Ripples
9 Team Rooster Pirates
10 None

It's also powerful

This computation takes around twenty seconds to run. That's ok. It's querying a terrabyte-scale dataset several thousand miles away. We're ok with twenty seconds.


In [4]:
users[users.followers > 100][['login', 'followers', 'following', 'blog']]


Out[4]:
login followers following blog
0 c9s 330 599 http://c9s.me
1 leto 175 635 http://dukeleto.pl
2 bingos 125 277 http://use.perl.org/~bingos/journal/
3 chovy 1056 39044 http://anthony.ettinger.name
4 chapmanb 120 30 http://bcbio.wordpress.com
5 equus12 109 4801 None
6 carljm 177 34 http://www.oddbird.net
7 andrewsmedina 171 295 http://www.andrewsmedina.com
8 jbalogh 172 47 http://jbalogh.me
9 ametaireau 116 57 http://www.notmyidea.org
10 robhudson 239 99 http://rob.cogit8.org/

More tables


In [5]:
repos = Table('mongodb://ghtorrentro:ghtorrentro@localhost/github::repos')
repos


Out[5]:
clone_url created_at description fork forks full_name git_url has_downloads has_issues has_wiki homepage html_url id language master_branch mirror_url name open_issues organization owner parent private pushed_at size source ssh_url svn_url updated_at url watchers
0 https://github.com/Michaelwussler/gittest.git 2012-07-12T10:41:03Z False 1 Michaelwussler/gittest git://github.com/Michaelwussler/gittest.git True True True None https://github.com/Michaelwussler/gittest 5002137 Java master None gittest 0 None {u'url': u'https://api.github.com/users/Michae... None False 2012-07-12T11:40:07Z 164 None git@github.com:Michaelwussler/gittest.git https://github.com/Michaelwussler/gittest 2012-07-12T11:40:07Z https://api.github.com/repos/Michaelwussler/gi... 1
1 https://github.com/sharyanto/perl-Task-BeLike-... 2011-03-16T15:06:38Z Install modules currently used in SHARYANTO's ... False 1 sharyanto/perl-Task-BeLike-SHARYANTO-Devel git://github.com/sharyanto/perl-Task-BeLike-SH... True True True http://search.cpan.org/dist/Task-BeLike-SHARYA... https://github.com/sharyanto/perl-Task-BeLike-... 1487560 Perl master None perl-Task-BeLike-SHARYANTO-Devel 0 None {u'url': u'https://api.github.com/users/sharya... None False 2012-07-12T11:35:03Z 608 None git@github.com:sharyanto/perl-Task-BeLike-SHAR... https://github.com/sharyanto/perl-Task-BeLike-... 2012-07-12T11:35:03Z https://api.github.com/repos/sharyanto/perl-Ta... 1
2 https://github.com/Toolpark/irma.git 2012-03-20T11:31:16Z False 1 Toolpark/irma git://github.com/Toolpark/irma.git True True True https://github.com/Toolpark/irma 3774477 JavaScript master None irma 0 {u'url': u'https://api.github.com/users/Toolpa... {u'url': u'https://api.github.com/users/Toolpa... None False 2012-07-12T11:43:31Z 964 None git@github.com:Toolpark/irma.git https://github.com/Toolpark/irma 2012-07-12T11:43:31Z https://api.github.com/repos/Toolpark/irma 2
3 https://github.com/hirakchatterjee/try_git.git 2012-07-12T11:19:45Z None False 1 hirakchatterjee/try_git git://github.com/hirakchatterjee/try_git.git True True True None https://github.com/hirakchatterjee/try_git 5002444 None master None try_git 0 None {u'url': u'https://api.github.com/users/hirakc... None False 2012-07-12T11:31:50Z 92 None git@github.com:hirakchatterjee/try_git.git https://github.com/hirakchatterjee/try_git 2012-07-12T11:31:50Z https://api.github.com/repos/hirakchatterjee/t... 1
4 https://github.com/anirbansaha/inmobi_general_... 2012-07-10T05:37:49Z inmobi_general_cookbooks False 1 anirbansaha/inmobi_general_cookbooks git://github.com/anirbansaha/inmobi_general_co... True True True None https://github.com/anirbansaha/inmobi_general_... 4969515 Ruby master None inmobi_general_cookbooks 0 None {u'url': u'https://api.github.com/users/anirba... None False 2012-07-12T11:31:44Z 448 None git@github.com:anirbansaha/inmobi_general_cook... https://github.com/anirbansaha/inmobi_general_... 2012-07-12T11:31:44Z https://api.github.com/repos/anirbansaha/inmob... 1
5 https://github.com/mmacedo/myapp.git 2012-07-05T21:09:14Z Just test False 1 mmacedo/myapp git://github.com/mmacedo/myapp.git True False False None https://github.com/mmacedo/myapp 4915307 Ruby master None myapp 0 None {u'url': u'https://api.github.com/users/mmaced... None False 2012-07-12T11:35:33Z 356 None git@github.com:mmacedo/myapp.git https://github.com/mmacedo/myapp 2012-07-12T11:35:33Z https://api.github.com/repos/mmacedo/myapp 1
6 https://github.com/rotschopf/SSE.git 2012-05-18T11:38:07Z False 1 rotschopf/SSE git://github.com/rotschopf/SSE.git True False False None https://github.com/rotschopf/SSE 4368710 VHDL master None SSE 0 None {u'url': u'https://api.github.com/users/rotsch... None False 2012-07-12T11:30:39Z 944 None git@github.com:rotschopf/SSE.git https://github.com/rotschopf/SSE 2012-07-12T11:30:39Z https://api.github.com/repos/rotschopf/SSE 1
7 https://github.com/pokermania/engine.ns.io-cli... 2012-07-05T15:59:51Z True 0 pokermania/engine.ns.io-client git://github.com/pokermania/engine.ns.io-clien... True False True https://github.com/pokermania/engine.ns.io-client 4910102 CoffeeScript master None engine.ns.io-client 0 {u'url': u'https://api.github.com/users/pokerm... {u'url': u'https://api.github.com/users/pokerm... {u'has_wiki': True, u'mirror_url': None, u'upd... False 2012-07-12T11:31:40Z 112 {u'has_wiki': True, u'mirror_url': None, u'upd... git@github.com:pokermania/engine.ns.io-client.git https://github.com/pokermania/engine.ns.io-client 2012-07-12T11:31:41Z https://api.github.com/repos/pokermania/engine... 1
8 https://github.com/trifork/dgws.git 2012-04-12T11:04:29Z False 3 trifork/dgws git://github.com/trifork/dgws.git True True True https://github.com/trifork/dgws 4003806 Java develop None dgws 0 {u'url': u'https://api.github.com/users/trifor... {u'url': u'https://api.github.com/users/trifor... None False 2012-07-12T11:40:57Z 168 None git@github.com:trifork/dgws.git https://github.com/trifork/dgws 2012-07-12T11:40:57Z https://api.github.com/repos/trifork/dgws 4
9 https://github.com/fzoli/MillServer.git 2012-06-27T07:01:42Z False 1 fzoli/MillServer git://github.com/fzoli/MillServer.git True True True None https://github.com/fzoli/MillServer 4805282 Java master None MillServer 0 None {u'url': u'https://api.github.com/users/fzoli'... None False 2012-07-12T11:31:32Z 75760 None git@github.com:fzoli/MillServer.git https://github.com/fzoli/MillServer 2012-07-12T11:31:32Z https://api.github.com/repos/fzoli/MillServer 1
10 https://github.com/gkno/gkno.github.com.git 2012-02-23T21:46:20Z False 2 gkno/gkno.github.com git://github.com/gkno/gkno.github.com.git True True True gkno.github.com https://github.com/gkno/gkno.github.com 3530198 None master None gkno.github.com 1 {u'url': u'https://api.github.com/users/gkno',... {u'url': u'https://api.github.com/users/gkno',... None False 2012-07-12T11:31:33Z 160 None git@github.com:gkno/gkno.github.com.git https://github.com/gkno/gkno.github.com 2012-07-12T11:31:33Z https://api.github.com/repos/gkno/gkno.github.com 2

In [6]:
issues = Table('mongodb://ghtorrentro:ghtorrentro@localhost/github::issues')
issues


Out[6]:
assignee body closed_at comments comments_url created_at events_url html_url id labels labels_url milestone number owner pull_request repo state title updated_at url user
0 None TweetLine is a Sublime Text 2 Plugin to post c... None 0 https://api.github.com/repos/wbond/package_con... 2012-11-20T15:51:49Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8509346 [] https://api.github.com/repos/wbond/package_con... None 809 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Add SublimeTweetLine 2012-11-20T15:52:20Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
1 None Submitting a new package named AutoIndent whic... None 0 https://api.github.com/repos/wbond/package_con... 2012-11-20T08:16:05Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8496155 [] https://api.github.com/repos/wbond/package_con... None 808 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Added AutoIndent 2012-11-20T08:16:05Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
2 None Adding support for my library of Sublime Text ... None 8 https://api.github.com/repos/wbond/package_con... 2012-11-19T22:47:17Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8485997 [] https://api.github.com/repos/wbond/package_con... None 806 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Adding Dayle Rees Color Schemes 2012-11-20T06:17:02Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
3 None Added SuperAnt 2012-10-02T02:34:26Z 0 None 2012-09-28T19:32:40Z None https://github.com/wbond/package_control_chann... 7226975 [] None None 657 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel closed SuperANT 2012-10-02T02:34:26Z https://api.github.com/repos/wbond/package_con... {u'url': u'https://api.github.com/users/aphex'...
4 None See readme for info! None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T19:27:28Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8479860 [] https://api.github.com/repos/wbond/package_con... None 805 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Adding Expand Selection by Paragraph Plugin 2012-11-19T19:27:28Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
5 None Added JavaScript snippets from: https://github... None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T19:23:11Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8479724 [] https://api.github.com/repos/wbond/package_con... None 804 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Added JavaScript Snippets 2012-11-19T19:23:11Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
6 None See [repository on GitHub](https://github.com/... None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T18:49:52Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8478712 [] https://api.github.com/repos/wbond/package_con... None 803 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Added ParentalControl Package 2012-11-19T18:49:52Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
7 None IMESupport is a plugin to fix an issue of Subl... None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T15:50:14Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8472990 [] https://api.github.com/repos/wbond/package_con... None 802 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Add IMESupport plugin 2012-11-19T15:50:14Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
8 None ThemeSelector is a Sublime Text 2 Plugin to se... None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T14:20:34Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8470181 [] https://api.github.com/repos/wbond/package_con... None 801 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open add ThemeSelector 2012-11-19T14:20:34Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
9 None None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T12:31:38Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8467611 [] https://api.github.com/repos/wbond/package_con... None 800 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Added Gauche Syntax 2012-11-19T12:31:38Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...
10 None None 0 https://api.github.com/repos/wbond/package_con... 2012-11-19T06:40:47Z https://api.github.com/repos/wbond/package_con... https://github.com/wbond/package_control_chann... 8460421 [] https://api.github.com/repos/wbond/package_con... None 799 wbond {u'diff_url': u'https://github.com/wbond/packa... package_control_channel open Adding LettuceFarmer plugin 2012-11-19T06:40:47Z https://api.github.com/repos/wbond/package_con... {u'following_url': u'https://api.github.com/us...

Use into to bring results home

What are the open Blaze issues?


In [7]:
into(list,
     issues[(issues.owner == 'ContinuumIO') 
          & (issues.repo == 'blaze')
          & (issues.state == 'open')][['title', 'created_at']])


Out[7]:
[(u"Blaze docs can't be read on iPhone", u'2013-06-24T21:33:56Z'),
 (u"Tiny fix about options parsing in the 'chunked dot' bench.",
  u'2013-06-04T22:23:51Z'),
 (u'Declaring dependencies', u'2013-04-19T05:59:02Z'),
 (u'add a basic mailmap file', u'2013-04-12T18:57:20Z'),
 (u'After following install instructions "blaze" module won\'t import',
  u'2013-03-22T22:13:22Z'),
 (u'Mailing list link on http://blaze.pydata.org/ points to GitHub not Google Groups',
  u'2013-03-22T22:05:51Z'),
 (u"Quickstart first example doesn't work as described",
  u'2013-03-21T01:24:53Z'),
 (u'Disagreement in size of "float" between blaze and numpy',
  u'2013-03-16T17:37:33Z'),
 (u'blaze.zeros() slowness', u'2013-03-13T16:06:19Z'),
 (u'Opening CTable fails', u'2013-03-03T22:22:14Z'),
 (u'fromiter silently catches exceptions thrown by generators, generates bad matrices',
  u'2013-03-01T17:36:34Z'),
 (u'Add complex32 support', u'2013-02-28T18:34:24Z'),
 (u'persistence of tables seems to not be working', u'2013-02-27T18:03:07Z'),
 (u'warnings when building extensions (at least on mac os x)',
  u'2013-02-27T16:47:24Z'),
 (u'Example in quick start docs does not work', u'2013-02-21T11:06:06Z'),
 (u'Parsing datashapes with "type Name = ..." in them returns None',
  u'2013-02-19T19:44:40Z'),
 (u'Vlen implementation issues on Windows', u'2013-02-14T11:59:17Z'),
 (u"Can't create Tables using RecordDecl per the examples in the docs",
  u'2013-02-01T04:17:56Z'),
 (u'many import errors', u'2013-07-18T02:55:58Z'),
 (u" BLZ `format` '' is not supported.", u'2013-07-31T15:21:21Z'),
 (u'Start on expressoin graph', u'2013-08-23T15:01:07Z'),
 (u"Can't create large multidimensional array in BLZ",
  u'2013-09-09T08:38:34Z'),
 (u'Blaze Kernels', u'2013-09-06T13:58:04Z'),
 (u'Iterators in BLZ should be in their own class', u'2013-09-25T09:35:43Z'),
 (u'Missing dynd-python dependency requirement', u'2013-10-03T21:00:37Z'),
 (u'outdated website examples?', u'2013-11-04T02:50:19Z'),
 (u'Update install doc about the dynd dependency. Closed #79',
  u'2013-10-29T20:52:45Z'),
 (u'blz storage r/w mode is wrong', u'2013-11-04T02:54:42Z'),
 (u'The printing code should support general datashapes',
  u'2013-11-28T15:35:01Z'),
 (u'Added data descriptors for CSV and JSON files.  Storage and Array also support them.',
  u'2013-12-10T12:33:08Z'),
 (u'a basic catalog', u'2013-12-10T08:06:03Z'),
 (u'Documentation', u'2013-12-09T16:21:42Z'),
 (u'Open command should be load', u'2013-12-04T20:36:58Z'),
 (u"Array's from iterators don't determing type correctly",
  u'2013-12-03T20:38:29Z'),
 (u'Cannot create a blaze array with a Record datashape',
  u'2013-12-02T14:03:39Z'),
 (u'[WIP] Blaze distributed capabilities', u'2013-12-12T16:43:22Z'),
 (u'Shuffle files around', u'2013-12-12T05:44:43Z'),
 (u'Do not require uri for local file', u'2013-12-11T22:46:15Z'),
 (u'Second round of shuffling', u'2013-12-12T19:25:59Z'),
 (u'[WIP] Csv dd cleanup refactor', u'2013-12-18T23:27:39Z'),
 (u'Update server code to use catalog', u'2013-12-18T08:02:41Z'),
 (u'Data Descriptor cleanup', u'2013-12-17T18:06:25Z'),
 (u'Dshape refactor', u'2013-12-19T23:58:34Z'),
 (u"use relative imports for tests and blz_ext's use of bparams",
  u'2013-12-21T22:24:48Z'),
 (u'Skipping cffi test on travis', u'2014-01-01T22:11:52Z'),
 (u'Convert remote array to a data descriptor', u'2014-01-08T01:05:46Z'),
 (u'[WIP] Remove blz', u'2014-01-13T14:43:52Z'),
 (u'Consider renaming blaze.drop function', u'2014-01-14T06:19:37Z'),
 (u'blaze.array from iterator type deduction, closes issue #86',
  u'2014-01-14T02:14:14Z'),
 (u'Clean up blaze.array methods/attributes', u'2014-01-14T06:27:11Z'),
 (u'Server compute context', u'2014-01-14T22:06:38Z'),
 (u'[WIP] Doc tweaks', u'2014-01-14T06:32:32Z'),
 (u'Add a caching mechanism to the blaze catalog', u'2014-01-16T23:02:26Z'),
 (u'Reform execution pipeline a bit (work towards better integration of new backends',
  u'2014-01-20T16:09:14Z'),
 (u'[WIP] HDF5 DataDescriptor', u'2014-01-20T12:19:01Z'),
 (u'Catalog module requires yaml', u'2014-01-21T10:06:59Z'),
 (u'design doc for numpy-like API', u'2014-01-21T08:06:18Z'),
 (u'Uniformexecution', u'2014-01-21T20:26:50Z'),
 (u'DataDescriptor for HDF5', u'2014-01-22T17:23:53Z'),
 (u'Diagnose and fix problem evaluating nested ufunc calls',
  u'2014-01-29T20:44:58Z'),
 (u'[WIP] Sql', u'2014-01-29T17:35:37Z'),
 (u'Adding internal import details', u'2014-01-30T20:01:48Z'),
 (u'Add drop to catalog', u'2014-01-31T17:06:36Z'),
 (u'[WIP] HDF5 DataDescriptor docs and examples', u'2014-02-03T16:30:06Z'),
 (u'Work on allowing creation of stand-alone blaze functions',
  u'2014-02-03T14:48:04Z'),
 (u'Backend generalization and gentle start on sql backend',
  u'2014-02-03T14:47:08Z'),
 (u'Syntax for choosing multiple fields', u'2014-02-05T20:21:37Z'),
 (u'Work on AIR debug printing in blaze REPL', u'2014-02-05T17:08:04Z'),
 (u'`blaze.open()` should try to recognize file extensions in case `format` param is not passed',
  u'2014-02-04T16:52:22Z'),
 (u'Pyinterp', u'2014-02-12T11:39:15Z'),
 (u'Strategy', u'2014-02-11T18:51:20Z'),
 (u'[WIP] Work in sql(ite) data descriptor', u'2014-02-11T15:04:35Z'),
 (u'[WIP] Initial design for making hdf5 files acting as native catalog dirs',
  u'2014-02-06T17:18:48Z'),
 (u'Assignation error between blaze array and numpy array',
  u'2014-02-17T14:56:44Z'),
 (u'Update README.md to fix a broken link', u'2014-02-14T22:09:30Z'),
 (u'hdf5 sample broken', u'2014-02-12T16:54:09Z'),
 (u'Cannot get string out of blaze array', u'2014-02-18T18:42:51Z'),
 (u'Iteration over blaze arrays returns data descriptors',
  u'2014-02-18T18:42:02Z'),
 (u'Manual CSV delimiter specification needed', u'2014-02-18T17:18:39Z'),
 (u'Dependency on pyparsing', u'2014-02-19T18:19:54Z'),
 (u'Adding testing driven code documentation', u'2014-02-19T17:55:22Z'),
 (u'Blaze sql record field selection is not lazy', u'2014-02-26T17:47:07Z'),
 (u'SQL tutorial and column selection', u'2014-02-26T11:01:45Z'),
 (u'Sqldocs', u'2014-02-24T17:21:48Z'),
 (u'[WIP] Removing some "import blaze" statements', u'2014-02-20T17:24:04Z'),
 (u'Graphs do not fold constants', u'2014-03-03T22:47:26Z'),
 (u'Simple graph', u'2014-03-03T22:27:37Z'),
 (u'A proposal for a simple SQL cache for Blaze.', u'2014-03-03T18:11:58Z'),
 (u'[WIP] Updates for new datashape grammar', u'2014-02-27T03:49:07Z'),
 (u"blaze server can't handle names with .", u'2014-03-03T22:53:01Z'),
 (u'[WIP] Constant folding', u'2014-03-03T22:48:09Z'),
 (u'blaze catalog requires hdf5 file have extension .h5',
  u'2014-03-03T22:53:54Z'),
 (u'Add hdf5 catalog to server sample', u'2014-03-03T23:17:53Z'),
 (u"Samples and doctest aren't tested with unittests",
  u'2014-03-04T00:13:39Z'),
 (u"Samples and doctest aren't tested with unittests",
  u'2014-03-04T00:26:47Z'),
 (u'[WIP] Fix sql printing and selection', u'2014-03-04T00:24:18Z'),
 (u'[WIP] Use new datashape overloader, general dispatch cleanup',
  u'2014-03-14T01:07:11Z'),
 (u'Tweak for overloader PR on datashape', u'2014-03-13T21:26:50Z'),
 (u'[WIP] Design Doc Update', u'2014-03-07T08:46:24Z'),
 (u'Mode in storage is respected in constructors now.  Fixes #83.',
  u'2014-03-14T21:03:25Z'),
 (u'Indexed assignment does not work', u'2014-03-18T17:37:58Z'),
 (u'[WIP] Element wise, chunked evaluator, suited for OOC operations',
  u'2014-03-17T15:22:22Z'),
 (u'Better error message on getting buffers out of deferred arrays',
  u'2014-03-19T12:57:18Z'),
 (u'Remove scidb (for now), make default overloading explicit',
  u'2014-03-19T00:12:57Z'),
 (u'Continuing proposal for a SQL cache for Blaze.', u'2014-03-19T19:59:13Z'),
 (u'Build dynd in travis', u'2014-03-20T01:39:59Z'),
 (u'Adding dynd install from source', u'2014-03-19T23:19:30Z'),
 (u'Blaze SQL Example Fails', u'2014-03-20T14:38:12Z'),
 (u'Update requirements use only pip on travis', u'2014-03-20T20:42:43Z'),
 (u'SQL catalogue parsing', u'2014-03-20T16:04:32Z'),
 (u'[WIP] Add ReductionBlazeFunc and instances using it',
  u'2014-03-20T23:50:13Z'),
 (u'Assignments of operations in ranges does not work',
  u'2014-03-21T10:40:34Z'),
 (u'A propsoal for handling SQL queries', u'2014-03-21T04:54:48Z'),
 (u'[WIP] A first proposal for a Table object', u'2014-03-22T08:48:24Z'),
 (u'[WIP] A first proposal for a Table object', u'2014-03-21T16:27:37Z'),
 (u'Finish reduction support', u'2014-03-25T21:15:57Z'),
 (u'Adding support for the HDF5 format in Storage class',
  u'2014-03-26T16:04:22Z'),
 (u'[WIP] datetime design doc', u'2014-03-26T08:13:01Z'),
 (u'A design document to convert the DataDescriptor class as first-class citizen',
  u'2014-03-27T16:25:28Z'),
 (u'Link datashape doc to datashape repo', u'2014-03-28T17:38:40Z'),
 (u'[WIP] High level parallel expression graph', u'2014-03-28T14:36:27Z'),
 (u'[WIP] datetime implementation', u'2014-03-28T22:48:39Z'),
 (u'[WIP] A blaze.where() function for filters for HDF5 and BLZ',
  u'2014-04-03T15:10:20Z'),
 (u'Array.__iter__ yields either scalars or arrays', u'2014-04-07T20:47:13Z'),
 (u'Efficient bulk append for DataDescriptors', u'2014-04-07T21:15:42Z'),
 (u'CSV_DDesc tweaks', u'2014-04-07T22:53:06Z'),
 (u'iterchunks(blen=None) never set to a default', u'2014-04-07T23:31:29Z'),
 (u'CSV_DDesc does not respect its own dialect', u'2014-04-07T23:35:51Z'),
 (u'Need datasets for comprehensive test suite', u'2014-04-08T14:06:51Z'),
 (u'JSON data descriptor reads everything into memory',
  u'2014-04-08T15:55:29Z'),
 (u'Structured array printing is verbose', u'2014-04-08T15:59:59Z'),
 (u'Python_DataDescriptor', u'2014-04-08T18:29:47Z'),
 (u'Replace use of `ddesc_as_py` for testing with `list`',
  u'2014-04-08T18:38:06Z'),
 (u'Getting element from array yield element not array',
  u'2014-04-08T18:50:57Z'),
 (u'Add Array methods to match numpy interface', u'2014-04-08T19:00:20Z'),
 (u'Blaze.JSON_DDesc not compatible with Pandas.DataFrame.to_json',
  u'2014-04-08T22:45:17Z'),
 (u'[WIP] - Design - Bulk transfer between Data Descriptors',
  u'2014-04-08T22:35:52Z'),
 (u'[WIP] Reduction tweaks', u'2014-04-08T19:54:50Z'),
 (u'Add validate to public blaze API', u'2014-04-09T16:24:07Z'),
 (u'[WIP] - Playing with data descriptors', u'2014-04-09T15:32:24Z'),
 (u'Replace Capability class with dictionary', u'2014-04-09T17:54:32Z'),
 (u'Intelligent caching', u'2014-04-09T21:55:58Z'),
 (u'Changes to DyND interrupt development workflow', u'2014-04-09T22:03:46Z'),
 (u'File system meta DataDescriptor', u'2014-04-10T14:24:37Z'),
 (u'[WIP] Rolling reduce design doc', u'2014-04-10T21:54:50Z'),
 (u'Shorten data descriptor file names', u'2014-04-11T14:42:12Z'),
 (u'[WIP] Adding support for the netCDF3/netCDF4 format',
  u'2014-04-11T12:21:48Z'),
 (u'Depend on SQLAlchemy for SQL code generation', u'2014-04-11T15:01:13Z'),
 (u'Validate and into', u'2014-04-16T22:50:36Z'),
 (u'New Data layer', u'2014-04-16T01:45:28Z'),
 (u'Add an optional, no dependencies configuration to travis',
  u'2014-04-15T10:01:36Z'),
 (u'Dispatched validate and coerce operations', u'2014-04-15T01:07:33Z'),
 (u'Table', u'2014-04-25T01:34:21Z'),
 (u'[WIP] allow JSON data descriptor to iterate over series of JSON files',
  u'2014-04-21T15:01:36Z'),
 (u'blaze/data/{dynd,json}.py hide modules when importing in blaze/data/',
  u'2014-05-01T22:44:08Z'),
 (u'Encode dates/datetimes in JSON data descriptor', u'2014-05-01T21:31:16Z'),
 (u'Table Reductions', u'2014-05-07T17:18:49Z'),
 (u'[WIP] Initial version of HDFS support via context manager',
  u'2014-05-05T01:57:46Z'),
 (u'Python join', u'2014-05-15T20:18:18Z'),
 (u'Various fixes to Table expressions', u'2014-05-15T17:10:42Z'),
 (u'[WIP] Compute layer operations on pyspark RDDs', u'2014-05-15T15:56:26Z'),
 (u'Depend on PyToolz', u'2014-05-14T20:29:57Z'),
 (u'Support Datetime in HDF5', u'2014-05-12T20:27:30Z'),
 (u'Support variable length strings in HDF5', u'2014-05-12T20:23:05Z'),
 (u'Datashape discovery', u'2014-05-08T19:57:26Z'),
 (u'Add simple static check on expr', u'2014-05-20T16:16:12Z'),
 (u'Various fixes, often in SQL', u'2014-05-19T23:07:28Z'),
 (u'Dangling file descriptors in `blaze.data.{csv,json}`',
  u'2014-05-20T20:36:15Z'),
 (u'Apply and Map generic functions onto TableExprs', u'2014-05-22T02:15:00Z'),
 (u'Scalar Expressions', u'2014-05-22T01:19:16Z'),
 (u'[WIP] Pyspark', u'2014-05-21T22:54:20Z'),
 (u'Add nunique operation', u'2014-05-21T18:30:40Z'),
 (u'Implicit Joins', u'2014-05-22T21:01:04Z'),
 (u'Booleans', u'2014-05-22T18:44:38Z'),
 (u'Use Blaze to benchmark various backends', u'2014-05-22T16:47:13Z'),
 (u'DyND OOC Backend', u'2014-05-22T16:41:31Z'),
 (u'Trivial demonstration development environment ', u'2014-05-22T16:32:04Z'),
 (u'Add timezone support to the datetime type', u'2014-05-23T23:45:06Z'),
 (u'DyND compute frontend', u'2014-05-23T23:43:30Z'),
 (u'Missing data support in DyND', u'2014-05-23T21:57:10Z'),
 (u'Jaccard similarity demo', u'2014-05-23T18:28:46Z'),
 (u'Label', u'2014-05-23T18:15:48Z'),
 (u'Serialization issues with `compute`', u'2014-05-23T14:43:47Z'),
 (u'Merge Reorg', u'2014-05-26T17:57:37Z'),
 (u'Arbitrary functions', u'2014-05-26T17:45:35Z'),
 (u'Blaze Table Object', u'2014-05-26T21:31:19Z'),
 (u'Add new quickstart ', u'2014-05-26T21:32:29Z'),
 (u'Development blaze on Binstar', u'2014-05-26T22:09:34Z'),
 (u'Update Catalog Server', u'2014-05-26T22:25:14Z'),
 (u'[WIP] Distinct', u'2014-05-27T15:23:17Z'),
 (u'Create `Distinct` term', u'2014-05-27T14:33:45Z'),
 (u'Clean up import *', u'2014-05-28T15:53:01Z'),
 (u'Jaccard2', u'2014-05-28T19:45:11Z'),
 (u'Datashape Discovery', u'2014-05-27T01:01:19Z'),
 (u'Impala Backend', u'2014-05-30T16:28:12Z'),
 (u'Spark stand-alone mode', u'2014-05-29T15:41:54Z'),
 (u'Add compute(Expr, DataDescriptor) implementation',
  u'2014-05-30T17:15:17Z'),
 (u'merge twitter dataset1 with WDC data', u'2014-05-30T21:08:35Z'),
 (u'Python multicolumn groupby', u'2014-05-30T22:41:24Z'),
 (u'Spark compute', u'2014-06-06T15:53:38Z'),
 (u'Spark', u'2014-06-06T14:54:43Z'),
 (u'Scalar Expressions', u'2014-06-06T19:53:30Z'),
 (u'Test unicode string support in `blaze.data`', u'2014-06-09T21:45:33Z'),
 (u'Delete old Vagrant code, favor conda', u'2014-06-09T22:02:42Z'),
 (u'Put `spark` on binstar', u'2014-06-09T23:00:52Z'),
 (u'Stress test datashape discovery', u'2014-06-09T22:37:23Z'),
 (u'Tune Python Streaming backend', u'2014-06-09T22:36:35Z'),
 (u'Fill out Spark implementation', u'2014-06-08T17:28:28Z'),
 (u'Jaccard fix', u'2014-06-11T00:33:46Z'),
 (u'Vagrant del', u'2014-06-10T22:31:44Z'),
 (u'Update documentation for reorg branch', u'2014-06-11T17:42:01Z'),
 (u"Multi-input compute doesn't play well with consumable data sources ",
  u'2014-06-11T17:55:38Z'),
 (u'SciPy 2014 Paper', u'2014-06-11T20:36:46Z'),
 (u'[WIP] Reorg Docs', u'2014-06-11T21:23:34Z'),
 (u'Blaze server', u'2014-06-16T23:01:29Z'),
 (u'Add `into` operation to api', u'2014-06-16T17:33:45Z'),
 (u'Interactive Table object', u'2014-06-18T19:35:02Z'),
 (u'data: CSV supports sep as alias for delimiter', u'2014-06-18T22:15:10Z'),
 (u'projection of filter TableExpr fails on Spark RDDs',
  u'2014-06-19T02:18:11Z'),
 (u'Delete old stuff', u'2014-06-20T14:05:45Z'),
 (u'Structured description of data descriptors', u'2014-06-20T21:44:42Z'),
 (u'Various Small fixes', u'2014-06-20T19:24:32Z'),
 (u'Fixup quickstart', u'2014-06-23T14:55:26Z'),
 (u'Merge reorg', u'2014-06-23T14:46:09Z'),
 (u'Imports', u'2014-06-23T19:06:52Z'),
 (u'Efficient CSV -> SQL migration', u'2014-06-24T21:40:50Z'),
 (u'Multi column join', u'2014-06-24T15:53:10Z'),
 (u"SQL extend doesn't preserve schema", u'2014-06-26T15:01:46Z'),
 (u'Better csv unicode support with `unicodecsv`', u'2014-06-26T17:07:24Z'),
 (u'Small fixes', u'2014-06-26T18:37:07Z'),
 (u'[WIP] - HDF5 variable length strings', u'2014-06-26T18:13:36Z'),
 (u'Scalar coercion - Server selection', u'2014-06-26T00:37:43Z'),
 (u'compute on HDF5 with PyTables', u'2014-06-28T17:45:46Z'),
 (u'Sample operation', u'2014-06-30T21:25:16Z'),
 (u'unicodecsv is slow', u'2014-07-01T15:27:26Z'),
 (u'Coerce works on Spark RDDs', u'2014-07-03T01:05:34Z'),
 (u'Extend Projection operation to data descriptors', u'2014-07-03T15:32:52Z'),
 (u'expr: Join automatically selects all shared columns',
  u'2014-07-03T18:48:18Z'),
 (u'Skip gzip csv tests on windows py2.x', u'2014-07-03T18:13:34Z'),
 (u'Expression Optimization', u'2014-07-03T22:06:57Z'),
 (u'INTO feature for CSV to DB', u'2014-07-09T16:41:55Z'),
 (u'Fix repr when Table is backed by mutable data', u'2014-07-11T18:49:15Z'),
 (u"setup.py doesn't include unicodecsv", u'2014-07-12T15:01:55Z'),
 (u'Added unicde to requirements and docs closes #378',
  u'2014-07-12T15:27:28Z'),
 (u'Various fixes', u'2014-07-07T14:21:23Z'),
 (u'Integer column names not working', u'2014-07-12T17:39:06Z'),
 (u"Conda install doesn't install toolz dependency", u'2014-07-12T23:29:55Z'),
 (u"spark tests aren't skipped when spark isn't installed",
  u'2014-07-13T01:49:52Z'),
 (u"Don't run spark tests if pyspark isn't available",
  u'2014-07-13T11:43:39Z'),
 (u'Assist Spark users in parsing CSV files', u'2014-07-02T21:23:25Z'),
 (u'Refactor recursion out of compute ', u'2014-07-02T14:41:59Z'),
 (u'Multiprocessing meta-backend', u'2014-07-01T16:46:31Z'),
 (u'Data descriptor constructors should specify missing values',
  u'2014-07-14T13:43:07Z'),
 (u'CSV header handling', u'2014-07-14T13:47:13Z'),
 (u'`data.py[...]` should avoid returning an iterator when data is small',
  u'2014-07-14T13:48:45Z'),
 (u'Improve error message for DataDescriptor.__len__',
  u'2014-07-14T13:50:05Z'),
 (u'Put docs on readthedocs', u'2014-07-14T13:50:53Z'),
 (u'Handle missing data in SQL data descriptor', u'2014-07-14T13:55:50Z'),
 (u'`rpy2` integration', u'2014-07-14T14:15:42Z'),
 (u'server expression security improvements', u'2014-07-14T19:00:13Z'),
 (u'support for Map of Columnwise', u'2014-07-14T20:47:52Z'),
 (u'Travis conda', u'2014-07-14T15:54:27Z'),
 (u'Reduction dshape and csv missing values', u'2014-07-16T15:45:56Z'),
 (u'Outer join', u'2014-07-16T21:28:54Z'),
 (u'Access columns as attributes, rather than with strings',
  u'2014-07-17T00:41:16Z'),
 (u'Add `into` implementations for TableExprs', u'2014-07-17T01:21:50Z'),
 (u'Add to `into`', u'2014-07-17T01:58:00Z'),
 (u'Implement __getattr__', u'2014-07-17T03:39:37Z'),
 (u'Consider using setuptools to install instead of distutils',
  u'2014-07-17T13:53:32Z'),
 (u'How to handle missing values in HDF5?', u'2014-07-17T15:56:05Z'),
 (u'Dependency list is incomplete and contradictory', u'2014-07-17T18:33:38Z'),
 (u'CSV keyword arguments documentation', u'2014-07-17T19:11:41Z'),
 (u'compute: projection of data descriptor uses `.py`',
  u'2014-07-17T19:43:51Z'),
 (u'CSV: errors and encoding arguments', u'2014-07-17T20:58:54Z'),
 (u'SQL databases match nullability to datashape.Option',
  u'2014-07-17T22:21:04Z'),
 (u'Broken links in \'blaze.pydata.org"', u'2014-07-17T22:29:34Z'),
 (u'Fixed two links and added google analytics tracking',
  u'2014-07-17T23:20:12Z'),
 (u'Implement a scalar expression parser', u'2014-07-17T23:03:24Z'),
 (u'Add to the CSV docstring', u'2014-07-17T22:54:45Z'),
 (u'Cleanup Scalar a bit', u'2014-07-18T12:48:30Z'),
 (u'By of merged columns has stopped working.', u'2014-07-18T18:27:18Z'),
 (u'Import of blaze.expr.scalar.* breaks merge', u'2014-07-18T18:31:31Z'),
 (u'Dev install instructions', u'2014-07-19T16:20:47Z'),
 (u'10 minutes to Blaze', u'2014-07-19T22:41:00Z'),
 (u'python: by maps call to compute onto child', u'2014-07-20T16:22:49Z'),
 (u'Add funders to webpage', u'2014-07-21T22:02:30Z'),
 (u'Selection for Date columns in SQL backend produces odd expression',
  u'2014-07-22T17:58:56Z'),
 (u'Support some NoSQL Database', u'2014-07-23T00:45:17Z'),
 (u'flatMapValue PySpark equivalent in Blaze', u'2014-07-24T15:35:39Z'),
 (u'Individual columns should be able to repr if not passed in CSV',
  u'2014-07-27T17:43:40Z'),
 (u'Raise when Table has a different schema than the underlying data',
  u'2014-07-27T18:33:43Z'),
 (u'Add google analytics to docs', u'2014-07-22T13:25:47Z'),
 (u'Fix double return', u'2014-07-28T18:40:40Z'),
 (u'Relax constraint that `By` must use reductions', u'2014-07-29T22:20:14Z'),
 (u'BColz', u'2014-07-31T03:17:53Z'),
 (u"`count` operation doesn't consider missing values",
  u'2014-07-31T17:13:00Z'),
 (u"expr: selection doesn't fail on non-rowwise child",
  u'2014-08-05T14:34:38Z'),
 (u'Visualize the capabilities of each backend', u'2014-08-05T20:28:24Z'),
 (u'bcolz, blz, and chunks', u'2014-08-05T19:56:38Z'),
 (u'Doc refresh', u'2014-08-05T22:36:08Z'),
 (u'[WIP] Bcolz copy', u'2014-08-05T16:13:18Z'),
 (u'MongoDB Backend', u'2014-08-01T22:32:16Z'),
 (u'PyTables computational backend', u'2014-07-30T19:35:04Z'),
 (u'Build Blaze on jenkins, upload to binstar blaze-dev account',
  u'2014-08-07T16:16:02Z'),
 (u'Make blaze.test() return True or False', u'2014-08-08T20:33:29Z'),
 (u'documentation link in the README is broken', u'2014-08-09T00:02:22Z'),
 (u'Consistent column naming scheme', u'2014-08-12T15:18:24Z'),
 (u'Spark by should use reduceby or foldby', u'2014-08-12T18:44:53Z'),
 (u'Comprehensive test suite for `into`', u'2014-08-12T18:58:19Z'),
 (u'Parallel chunking or streaming backend', u'2014-08-12T19:04:16Z'),
 (u'Into test', u'2014-08-12T21:12:49Z'),
 (u'pandas: enforce expression column names on `by`', u'2014-08-12T15:32:07Z'),
 (u'Add BColz and chunking backend', u'2014-08-12T02:52:50Z'),
 (u'Graceful handling of empty results', u'2014-08-13T15:40:37Z'),
 (u'into: test foo <- CSV', u'2014-08-13T15:18:20Z'),
 (u'SQL Table Overwrite', u'2014-08-14T04:01:10Z'),
 (u"'by' of pandas DataFrame doesn't work as expected",
  u'2014-08-14T13:46:25Z'),
 (u'add into(DataFrame, pytables Table)', u'2014-08-11T21:21:07Z'),
 (u'[WIP] Feature/csv to sql natively', u'2014-08-10T02:36:58Z'),
 (u'Maintain length in table expressions', u'2014-08-15T01:44:20Z'),
 (u'`from blaze import *` results in override of built-ins ',
  u'2014-08-15T14:05:14Z'),
 (u'dispatch on mathematical functions', u'2014-08-15T15:17:46Z'),
 (u'Open world assumption and 3VL in Blaze', u'2014-08-15T17:14:21Z'),
 (u'Compute on scalar expressions', u'2014-08-15T19:39:11Z'),
 (u'Overload `__len__` to work on Table Expressions and on Table interactive objects',
  u'2014-08-15T18:35:50Z'),
 (u'Which packages should be required for blaze, which should be optional?',
  u'2014-08-15T22:27:59Z'),
 (u'Look towards dplyr for ideas to expand expression input',
  u'2014-08-16T13:43:04Z'),
 (u'ETL on bad CSV data - what should we do?', u'2014-08-18T16:05:15Z'),
 (u'[WIP] - Summary', u'2014-08-18T17:30:20Z'),
 (u'Compute on scalar expressions', u'2014-08-18T17:23:05Z'),
 (u'comprehensive compute tests', u'2014-08-16T17:57:20Z'),
 (u"[WIP] csv_into (don't merge)", u'2014-08-18T20:25:04Z'),
 (u'Release Blogpost', u'2014-08-19T15:29:15Z'),
 (u'General function expressions', u'2014-08-19T18:51:04Z'),
 (u"Don't use eval when evaluating RealMath subclasses",
  u'2014-08-19T20:11:47Z'),
 (u'drop and create_index dispatched functions', u'2014-08-20T14:56:29Z'),
 (u'Does not list pymongo as a requirement', u'2014-08-20T16:37:23Z'),
 (u'Small fixes 3', u'2014-08-20T18:32:03Z'),
 (u'WIP: compute on HDF5 with PyTables', u'2014-08-19T21:48:48Z'),
 (u'Add persistent storage systems to into comprehensive test',
  u'2014-08-20T18:37:17Z'),
 (u'SQLAlchemy string types - encoding and fixed lengths',
  u'2014-08-20T21:27:43Z'),
 (u'[WIP] - `dplyr` interface`', u'2014-08-19T15:45:09Z'),
 (u'Bug: columns attribute of TableSymbol is None when creating a schema with discover(tables.Table)',
  u'2014-08-21T14:34:33Z'),
 (u'WIP: Add create_index / drop_index functionality',
  u'2014-08-21T20:32:18Z'),
 (u'into implementation for SQL using CSV loading', u'2014-08-22T17:15:25Z'),
 (u'WIP: RethinkDB for blaze', u'2014-08-23T04:47:39Z'),
 (u'Added example rpy2 conversion', u'2014-08-23T01:32:23Z'),
 (u'update readme and docs with api changes', u'2014-08-22T18:16:26Z'),
 (u'Refactor Chunks', u'2014-08-22T17:27:13Z'),
 (u'Continue to test and improve `into`', u'2014-08-22T16:21:23Z'),
 (u'[WIP] into(pytables Table, csv) with option to ignore errors in CSV files',
  u'2014-08-22T04:35:34Z'),
 (u'GZipped CSV <- SQL with new migration system', u'2014-08-23T20:49:44Z'),
 (u'Remove old core directory', u'2014-08-23T16:34:17Z'),
 (u'Lightweight descriptor for various file formats like Excel, SPSS',
  u'2014-08-24T00:25:41Z'),
 (u'Moar CSV fixes!', u'2014-09-19T23:07:44Z'),
 (u'Parse datetimes in CSV.reader', u'2014-09-17T23:26:43Z'),
 (u'Move datetime logic from into to csv.reader', u'2014-09-17T18:40:53Z'),
 (u'[WIP] Test table coverage', u'2014-09-17T13:43:57Z'),
 (u'Chunked into', u'2014-09-17T12:43:31Z'),
 (u'API for moving between type systems', u'2014-09-17T00:58:07Z'),
 (u'A Blaze equivalent for: SELECT * WHERE t.column IN list_values',
  u'2014-09-16T20:46:45Z'),
 (u"Error in into(DataFrame,  '/*.%s.gz' % dataset) that used to work",
  u'2014-09-16T06:51:06Z'),
 (u'How should we handle pulling strings out of HDF5?',
  u'2014-09-13T21:40:59Z'),
 (u'Ideas to clean codebase', u'2014-09-12T17:27:04Z'),
 (u'Adding examples, datasets for examples, and .coveragerc ignores.',
  u'2014-09-11T18:57:29Z'),
 (u'SparkSQL HiveQL', u'2014-09-10T23:18:29Z'),
 (u'SparkSQL map', u'2014-09-10T23:17:20Z'),
 (u'Google BigQuery', u'2014-09-09T19:23:15Z'),
 (u'Google Spreadsheet Table', u'2014-09-09T18:29:54Z'),
 (u'[WIP] Handle sqlite INTO call on Windows', u'2014-09-08T21:59:39Z'),
 (u'Consider using this tox setup for testing HDFS related work',
  u'2014-09-08T17:03:29Z'),
 (u'NetCDF4 Backend', u'2014-09-08T15:44:04Z'),
 (u"Investigate use of SQLAlchemy's ORM system for sql generation",
  u'2014-09-07T19:42:53Z'),
 (u'[RFC] - Arrays', u'2014-09-07T01:23:53Z'),
 (u'xfail on sqlite3 command not available on windows',
  u'2014-09-07T00:00:21Z'),
 (u'Prevent coveralls from commenting', u'2014-09-06T20:24:14Z'),
 (u'CSV Headers with Spark', u'2014-09-04T20:36:02Z'),
 (u'#362 Sample operation initial work', u'2014-09-04T19:07:41Z'),
 (u'General Performance Guideline: Backend comparison',
  u'2014-09-04T18:45:26Z'),
 (u'Increase testing coverage', u'2014-09-04T17:20:26Z'),
 (u'into(Spark/HDFS, SQL DBs)', u'2014-09-03T23:06:30Z'),
 (u'Creating a `test_compute_exhaustive.py`', u'2014-09-03T18:57:15Z'),
 (u'Add developer docs on how to build a new Expression type',
  u'2014-09-03T17:41:11Z'),
 (u'Add more usage examples to the docs', u'2014-09-02T22:19:54Z'),
 (u'String matching operation', u'2014-09-02T21:34:51Z'),
 (u"str(Table.count()) doesn't show count", u'2014-09-02T21:03:51Z'),
 (u"SQL <- CSV doesn't work properly with sqlite", u'2014-09-02T16:39:50Z'),
 (u'rollapply/rolling/window operation', u'2014-09-02T15:07:15Z'),
 (u'Create frontend to match LINQ syntax', u'2014-09-01T19:49:36Z'),
 (u'Support frequent releases', u'2014-09-01T16:26:29Z'),
 (u'Display expr information', u'2014-08-30T19:04:29Z'),
 (u'Compute pool with timeouts for Server', u'2014-08-30T17:44:40Z'),
 (u'PyCon Submission', u'2014-08-28T20:57:25Z'),
 (u'Datetime support (and more robust support in general) in PyTables',
  u'2014-08-28T14:20:27Z'),
 (u'Use COPY function from Psycopg', u'2014-08-28T03:01:19Z'),
 (u'Submit paper for PyHPC 2014', u'2014-08-26T16:40:43Z'),
 (u'Discussion of how to support a large number of backends',
  u'2014-08-26T15:04:21Z'),
 (u'Improve internal documentation/scripts to update documentation',
  u'2014-08-25T16:38:13Z'),
 (u'Change scalar_symbol into expr', u'2014-09-26T03:58:53Z'),
 (u'[WIP] SciDB backend', u'2014-09-26T01:37:30Z'),
 (u'Pytables column head', u'2014-09-26T01:32:40Z'),
 (u'string operations', u'2014-09-25T23:49:50Z'),
 (u'WIP: Implement ColumnWise for MongoDB', u'2014-09-25T23:11:35Z'),
 (u'HBase', u'2014-09-25T22:32:52Z'),
 (u'Update server design doc', u'2014-09-25T21:25:46Z'),
 (u'Allow ignoring particular exceptions when using glob resources',
  u'2014-09-25T20:15:22Z'),
 (u"Blaze channel blaze install doesn't include dependencies",
  u'2014-09-24T13:47:07Z'),
 (u'Rename Like, Regex to TextLike, TextRegex', u'2014-09-24T12:13:43Z'),
 (u'Insert projections opportunistically into expressions',
  u'2014-09-24T01:29:40Z'),
 (u'WIP: Fix mysql into', u'2014-09-24T01:14:05Z'),
 (u'Attribute expressions', u'2014-09-22T17:21:13Z'),
 (u'Support datetime attributes ', u'2014-09-22T13:07:39Z'),
 (u'Support hive, presto through pyhive project', u'2014-09-21T19:42:37Z'),
 (u'Rename `*_index` with `index_*`', u'2014-09-21T19:32:28Z'),
 (u'API: somethoughts / ideas', u'2014-10-01T19:33:49Z'),
 (u'Break isnull type operations out into a separate expression',
  u'2014-09-30T22:01:53Z'),
 (u'API: PyTables/Pandas/HDF5', u'2014-09-30T17:01:41Z'),
 (u'Required kwargs for certain dispatched functions.',
  u'2014-09-30T16:36:31Z'),
 (u'BUG,DOC: "Examples" link 404', u'2014-09-30T15:02:59Z'),
 (u'Misleading error message when building table', u'2014-09-29T21:20:04Z'),
 (u'[WIP] Test table api', u'2014-10-02T14:21:57Z'),
 (u'into SQL <- CSV sends header as data', u'2014-09-29T13:47:30Z'),
 (u'[WIP] Refactor Expr', u'2014-09-27T02:26:13Z'),
 (u'Accept dot-delimited schemaname.tablename ', u'2014-09-27T02:22:32Z'),
 (u'API: print/show_backends', u'2014-09-26T15:31:01Z'),
 (u'Use dir and getattr to dispatch methods based on datashape',
  u'2014-09-26T12:18:41Z'),
 (u'SQL <- CSV loading errors pop up inappropriately',
  u'2014-10-03T20:15:41Z'),
 (u'Docs: Include MongoDB examples/docstrings in the website',
  u'2014-10-03T20:30:11Z'),
 (u'How should we handle user facing warnings?', u'2014-10-03T22:31:52Z'),
 (u'Problem converting expression Column to nd.array',
  u'2014-10-04T19:12:52Z'),
 (u'Datetime access in SQL databases', u'2014-10-06T01:52:39Z'),
 (u'More datetime access expressions', u'2014-10-06T01:53:17Z'),
 (u'Nested behavior in MongoDB', u'2014-10-06T01:56:14Z'),
 (u'Nested behavior in Python, Spark', u'2014-10-06T01:57:01Z'),
 (u'Resource for MongoDB connection string', u'2014-10-06T16:32:57Z'),
 (u'Handle Gzip complexity in csvopen', u'2014-10-07T15:30:31Z'),
 (u'Various fixes 4', u'2014-10-06T15:47:15Z'),
 (u'CI: use appveyor to build for windows', u'2014-10-07T16:32:49Z')]

Replace list with DataFrame, np.ndarray, or a filename in your favorite format to store results in different systems.

Inspect Generated Mongo Queries

Mongo uses a JSON query langauge. Lets inspect these queries rather than executing them.

This uses the internal API


In [8]:
from blaze import compute, TableExpr, dispatch
from blaze.compute.mongo import MongoQuery
@dispatch(TableExpr, MongoQuery, dict)
def post_compute(expr, q, d):
    # Used to communicate to server
    # Now just return query
    return q.query

In [9]:
compute(users[users.followers > 100][['login', 'followers', 'following', 'blog']].head(10))


Out[9]:
({'$match': {'followers': {'$gt': 100}}},
 {'$project': {'blog': 1, 'followers': 1, 'following': 1, 'login': 1}},
 {'$limit': 10})

In [10]:
compute(users.location.count_values().head(10))


Out[10]:
({'$project': {'location': 1}},
 {'$group': {'_id': {'location': '$location'}, 'count': {'$sum': 1}}},
 {'$project': {'count': '$count', 'location': '$_id.location'}},
 {'$sort': {'count': -1}},
 {'$limit': 10})