The SparkContext.addPyFiles() function can be used to add py files. We can define objects and variables in these files and make them available to the Spark cluster.

Create a SparkContext object


In [1]:
from pyspark import SparkConf, SparkContext, SparkFiles
from pyspark.sql import SparkSession

In [2]:
sc = SparkContext(conf=SparkConf())

Add py files


In [3]:
sc.addPyFile('pyFiles/my_module.py')

In [4]:
SparkFiles.get('my_module.py')


Out[4]:
'/private/var/folders/2_/kb60z5_j0k91tyh740s1zhn40000gn/T/spark-4f959e9f-4af6-490e-afce-02e1582aae8d/userFiles-8b1c073b-4c82-467a-b9ff-021aa3067abe/my_module.py'

Use my_module.py

We can import my_module as a python module


In [5]:
from my_module import *

In [6]:
addPyFiles_is_successfull()


Out[6]:
True

In [7]:
sum_two_variables(4,5)


Out[7]:
9