In [1]:
:load KarpsDisplays KarpsDagDisplay
In [2]:
:extension DeriveGeneric
:extension FlexibleContexts
:extension OverloadedStrings
:extension GeneralizedNewtypeDeriving
:extension FlexibleInstances
:extension MultiParamTypeClasses
In [3]:
import Spark.Core.Dataset
import Spark.Core.Context
import Spark.Core.Column
import Spark.Core.ColumnFunctions
import Spark.Core.Functions
import Spark.Core.Row
import Spark.Core.Types
import Spark.Core.Try
import KarpsDisplays
import KarpsDagDisplay
import qualified Data.Vector as V
import qualified Data.Text as T
import GHC.Generics
import IHaskell.Display
In [4]:
import Spark.Core.StructuresInternal(ComputationID(..))
In [5]:
conf = defaultConf {
confEndPoint = "http://10.0.2.2",
confRequestedSessionName = "spark_intro7" }
createSparkSessionDef conf
In [6]:
ds = dataset ([1,2,3] :: [Int]) @@ "data"
-- Turns the dataset into a column and computes the sum of all the elements in this column
s1 = sumCol (asCol ds) @@ "sum"
-- Counts the element in the dataset
s2 = count ds @@ "count"
x = (s1 + s2) @@ "result"
exec1Def x
A shortcut method lets you display the graph of physical operations accomplished by Spark under the hood.
In order to access the computations, though, you need to know the ID of the computation. It is printed
when you execute a graph at the line [Info] Sending computations ... computations/cache_sum1/XXXXX/create
In [7]:
displayRDD "0"
In [ ]: