Arabesque API

Current Version: 1.0.2-BETA

Arabesque is a distributed graph mining system that enables quick and easy development of graph mining algorithms, while providing a scalable and efficient execution engine running on top of Hadoop.

Benefits of Arabesque:

  • Simple and intuitive API, specially tailored for Graph Mining algorithms.
  • Transparently handling of all complexities associated with these algorithms.
  • Scalable to hundreds of workers.
  • Efficient implementation: negligible overhead compared to equivalent centralized solutions.

Arabesque is open-source with the Apache 2.0 license.

In [3]:
import io.arabesque.ArabesqueContext

println (s"spark application ID: ${sc.applicationId}")

// arabesque context is built on top of SparkContext
val arab = new ArabesqueContext (sc)
println (s"arabesque context = ${arab}")

// get local path for the sample graph
val localPath = s"${System.getenv ("ARABESQUE_HOME")}/data/citeseer-single-label.graph"
println (s"localPath = ${localPath}")

// several arabesque graphs are built on top of ArabesqueContext
val arabGraph = arab.textFile (localPath)
println (s"arabesque graph = ${arabGraph}")

// generating motifs of size 3
val motifs = arabGraph.motifs (3).set ("agg_ic", true).set ("comm_ss", "embedding")
println (s"arabesque result = ${motifs}")

println (motifs.config.getOutputPath)

// embeddings RDD
val embeddings = motifs.embeddings
println (motifs.config.getOutputPath)
println (s"two sample embeddings:\n${embeddings.take(2).mkString("\n")}")

// getting aggregations, one by one ()
val aggKeys = motifs.registeredAggregations
println (s"aggKeys = ${aggKeys.mkString(" ")}")
val motifsAgg = motifs.aggregation (aggKeys(0))
println (motifsAgg)

// getting all aggregations
val allAggs = motifs.aggregations
println (allAggs)


spark application ID: local-1466351689305
arabesque context = io.arabesque.ArabesqueContext@512b4e8b
localPath = /home/viniciusvdias/environments/Arabesque/data/citeseer-single-label.graph
arabesque graph = io.arabesque.ArabesqueGraph@20bfec54
arabesque result = ArabesqueResult(org.apache.spark.SparkContext@410c64f0,SparkConfiguration(Map(arabesque.computation.class -> io.arabesque.gmlib.motif.MotifComputation, arabesque.output.path -> /tmp/arabesque-eded5f5f-27cc-41c6-aec9-c84a2bd9cc3a/graph-22eb0f8a-56ac-45f1-b72f-e2686eddf55f/motifs-ff13727e-fd93-4e3a-9545-6241dcf350c1, comm_ss -> embedding, arabesque.motif.maxsize -> 3, arabesque.graph.location -> /home/viniciusvdias/environments/Arabesque/data/citeseer-single-label.graph, arabesque.graph.local -> false, agg_ic -> true)))
two sample embeddings:
VEmbedding(477, 2427, 2785)
VEmbedding(477, 2427, 2928)
aggKeys = motifs
Map([0,1-1,1],[1,1-2,1],[0,1-2,1] -> 1166, [1,1-2,1],[0,1-2,1] -> 23380)
Map(motifs -> Map([0,1-1,1],[1,1-2,1],[0,1-2,1] -> 1166, [1,1-2,1],[0,1-2,1] -> 23380))