notebook.community
Edit and run
Spark Programming Model
SparkContext
gives access to various Spark methods. It's initialized with an instance of
SparkConf
for the Spark cluster configuration.
Resilient Distributed Datasets (RDD)
RDD is a collection of objects of the same type:
Distribued across many nodes
fault-tolerant
In [ ]: