PixieDust PackageManager lets you install spark packages inside your notebook. This is especailly useful when you're working in a hosted cloud environment without access to configuration files. Use PixieDust Package Manager to install:
Note: After you install a package, you must restart the kernel.
In [ ]:
import pixiedust
pixiedust.printAllPackages()
In [ ]:
pixiedust.installPackage("graphframes:graphframes:0")
In [ ]:
pixiedust.printAllPackages()
GraphGrames comes with sample data sets. Even if GraphFrames is already installed, running the install command loads the Python that comes along with the package and enables features like the one you're about to see. Run the following cell and PixieDust displays a sample graph data set called friends. On the upper left of the display, click the table dropdown and switch between views of nodes and edges.
In [ ]:
#import the Graphs example
from graphframes.examples import Graphs
#create the friends example graph
g=Graphs(sqlContext).friends()
#use the pixiedust display
display(g)
In [ ]:
pixiedust.installPackage("org.apache.commons:commons-csv:0")
In [ ]:
pixiedust.installPackage("https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar")
To understand what you can do with this jar file, read David Taieb's latest Realtime Sentiment Analysis of Twitter Hashtags with Spark tutorial.
In [ ]:
pixiedust.uninstallPackage("org.apache.commons:commons-csv:0")