Displaying data with Python

Haskell is a great language for complex processing, but it lacks the visualization libraries that R and Python users have come to enjoy. This tutorial shows how to integrate both together when doing interactive analysis.


In [1]:
:extension DeriveGeneric
:extension FlexibleContexts
:extension OverloadedStrings
:extension GeneralizedNewtypeDeriving
:extension FlexibleInstances
:extension MultiParamTypeClasses

In [2]:
import GHC.Generics (Generic)

import Spark.Core.Dataset
import Spark.Core.Context
import Spark.Core.Functions
import Spark.Core.Column
import Spark.Core.Types
import Spark.Core.Row
import Spark.Core.ColumnFunctions

conf = defaultConf {
        confEndPoint = "http://10.0.2.2",
        confRequestedSessionName = "session05_python" }
createSparkSessionDef conf


[Debug] Creating spark session at url: http://10.0.2.2:8081/sessions/session05_python @(<unknown>:<unknown> <unknown>:0:0)

In [3]:
import Spark.Core.Types

In [4]:
data MyData = MyData {
  aBigId :: Int,
  importantData :: Int } deriving (Show, Eq, Generic, Ord)

instance SQLTypeable MyData
instance FromSQL MyData
instance ToSQL MyData

In [5]:
let collection = [MyData 1 2, MyData 3 2, MyData 5 4]

let ds = dataset collection @@ "dataset"
let c = collect (asCol ds) @@ "collected_data"
_ <- exec1Def c


[Debug] executeCommand1: computing observable collected_data@org.spark.Collect![{aBigId:int importantData:int}] @(<unknown>:<unknown> <unknown>:0:0)
[Info] Sending computations at url: http://10.0.2.2:8081/computations/session05_python/0/create @(<unknown>:<unknown> <unknown>:0:0)
[Debug] executeCommand1: Tracked nodes are [(9c12a..,NPath(collected_data),[{aBigId:int importantData:int}],collected_data)] @(<unknown>:<unknown> <unknown>:0:0)
[Info] _computationMultiStatus: /collected_data finished @(<unknown>:<unknown> <unknown>:0:0)

In [2]:
from kraps import *
ks = connectSession("session05_python", address='localhost')
ks


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-7602fdc23d32> in <module>()
----> 1 from kraps import *
      2 ks = connectSession("session05_python", address='localhost')
      3 ks

ImportError: No module named kraps

In [ ]:
ks.pandas("collected_data")

In [ ]:
print ks.url('collected_data')

In [ ]: