Example of simple sparkhpc usage in the Jupyter notebook

Configure python for using the spark python libraries with findspark


In [1]:
import findspark; findspark.init()

Launch the standalone spark clusters using sparkhpc


In [2]:
import sparkhpc

In [3]:
sj = sparkhpc.sparkjob.LSFSparkJob(ncores=4)
sj.wait_to_start()


INFO:sparkhpc:Submitted cluster 0

In [4]:
sj


Out[4]:
Job ID Number of cores Status Spark UI Spark URL 33084182 4 running http://spark.master:8080 spark://spark.master:7077

In [5]:
sj2 = sparkhpc.sparkjob.LSFSparkJob(ncores=10)
sj2.submit()


INFO:sparkhpc:Submitted cluster 1

In [6]:
sj.show_clusters()


ClusterID Job ID Number of cores Status Spark UI Spark URL 0 33084182 4 running http://spark.master:8080 spark://spark.master:7077 1 33084664 10 submitted None None

Create a SparkContext and start computing


In [7]:
from pyspark import SparkContext

In [8]:
sc = SparkContext(master=sj.master_url)

In [9]:
sc.parallelize(range(100)).count()


Out[9]:
100

Teardown


In [10]:
sj.stop()
sj2.stop()


INFO:sparkhpc:Job <33084182> is being terminated

INFO:sparkhpc:Job <33084664> is being terminated


In [11]:
sj.show_clusters()


ClusterID Job ID Number of cores Status Spark UI Spark URL 0 33084182 4 running http://10.205.11.34:8080 spark://spark.master:7077 1 33084664 10 submitted None None

In [ ]: