Spark Programming Model

  • SparkContext gives access to various Spark methods. It's initialized with an instance of SparkConf for the Spark cluster configuration.

Resilient Distributed Datasets (RDD)

RDD is a collection of objects of the same type:

  • Distribued across many nodes
  • fault-tolerant

In [ ]: