In [1]:
:load KarpsDisplays KarpsDagDisplay

In [2]:
:extension DeriveGeneric
:extension FlexibleContexts
:extension OverloadedStrings
:extension GeneralizedNewtypeDeriving
:extension FlexibleInstances
:extension MultiParamTypeClasses

In [3]:
import Spark.Core.Dataset
import Spark.Core.Context
import Spark.Core.Column
import Spark.Core.ColumnFunctions
import Spark.Core.Functions
import Spark.Core.Row
import Spark.Core.Types
import Spark.Core.Try
import KarpsDisplays
import KarpsDagDisplay

import qualified Data.Vector as V
import qualified Data.Text as T
import GHC.Generics
import IHaskell.Display

In [4]:
import Spark.Core.StructuresInternal(ComputationID(..))

In [5]:
conf = defaultConf {
        confEndPoint = "http://10.0.2.2",
        confRequestedSessionName = "spark_intro7" }

createSparkSessionDef conf


[Debug] Creating spark session at url: http://10.0.2.2:8081/sessions/spark_intro7 @(<unknown>:<unknown> <unknown>:0:0)

In [6]:
ds = dataset ([1,2,3] :: [Int]) @@ "data"
-- Turns the dataset into a column and computes the sum of all the elements in this column
s1 = sumCol (asCol ds) @@ "sum"
-- Counts the element in the dataset
s2 = count ds @@ "count"
x = (s1 + s2) @@ "result"
exec1Def x


[Debug] executeCommand1': computing observable /result@org.spark.LocalPlus!int @(<unknown>:<unknown> <unknown>:0:0)
[Info] Sending computations at url: http://10.0.2.2:8081/computations/spark_intro7/0/createwith nodes: [/data@org.spark.DistributedLiteral:int,/sum@SUM!int,/count@COUNT!int,/result@org.spark.LocalPlus!int] @(<unknown>:<unknown> <unknown>:0:0)
[Info] _computationMultiStatus: /sum running @(<unknown>:<unknown> <unknown>:0:0)
[Info] _computationMultiStatus: /count running @(<unknown>:<unknown> <unknown>:0:0)
[Info] _computationMultiStatus: /sum finished @(<unknown>:<unknown> <unknown>:0:0)
[Info] _computationMultiStatus: /count finished @(<unknown>:<unknown> <unknown>:0:0)
[Info] _computationMultiStatus: /result finished @(<unknown>:<unknown> <unknown>:0:0)
9

A shortcut method lets you display the graph of physical operations accomplished by Spark under the hood. In order to access the computations, though, you need to know the ID of the computation. It is printed when you execute a graph at the line [Info] Sending computations ... computations/cache_sum1/XXXXX/create


In [7]:
displayRDD "0"



In [ ]: