Add Spark packages and run inside your notebook

PixieDust PackageManager lets you install spark packages inside your notebook. This is especailly useful when you're working in a hosted cloud environment without access to configuration files. Use PixieDust Package Manager to install:

  • a spark package from spark-packages.org
  • from maven search repository
  • a jar file directly from URL

Note: After you install a package, you must restart the kernel.

View list of packages

To see the packages installed on your system, run the following command:


In [ ]:
import pixiedust
pixiedust.printAllPackages()

Add a package from spark-packages.org

Run the following cell to install GraphFrames.


In [ ]:
pixiedust.installPackage("graphframes:graphframes:0")

Restart your kernel

From the menu at the top of this notebook, choose Kernel > Restart, then run the next cell.

View updated list of packages

Run printAllPackages again to see that GraphFrames is now in your list:


In [ ]:
pixiedust.printAllPackages()

Display a GraphFrames data sample

GraphGrames comes with sample data sets. Even if GraphFrames is already installed, running the install command loads the Python that comes along with the package and enables features like the one you're about to see. Run the following cell and PixieDust displays a sample graph data set called friends. On the upper left of the display, click the table dropdown and switch between views of nodes and edges.


In [ ]:
#import the Graphs example
from graphframes.examples import Graphs
#create the friends example graph
g=Graphs(sqlContext).friends()
#use the pixiedust display
display(g)

Install from maven

To install a package from Maven, visist the project and find its groupId and artifactId, then enter it in the following install command. Read more. For example, the following cell installs Apache Commons:


In [ ]:
pixiedust.installPackage("org.apache.commons:commons-csv:0")

Install a jar file directly from a URL

To install a jar file that is not packaged in a maven repository, provide its URL.


In [ ]:
pixiedust.installPackage("https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar")

Follow the tutorial

To understand what you can do with this jar file, read David Taieb's latest Realtime Sentiment Analysis of Twitter Hashtags with Spark tutorial.

Uninstall a package

It's just as easy to get rid of a package you installed. Just run the command pixiedust.uninstallPackage("<<mypackage>>"). For example, you can uninstall Apache Commons:


In [ ]:
pixiedust.uninstallPackage("org.apache.commons:commons-csv:0")