Scala Example for Google Cloud Storage

This is an example of how to use Generic Connector with Apache Spark in order to process files stored in a Google Cloud Storage bucket.

Load dependencies for Apache Spark 2.x


In [ ]:
%dep

z.load("alvsanand:spark-generic-connector:0.2.0-spark_2x-s_2.11")

Import dependencies


In [ ]:
import org.apache.spark.streaming.sgc._
import es.alvsanand.sgc.google.cloud_storage._

Create the SgcConnectorParameters with the desired parameters


In [ ]:
val parameters = CloudStorageParameters("CREDENTIALS_ZIP_URL", "BUCKET_NAME")

Create the RDD passing the SgcConnectorFactory and the parameters


In [ ]:
val rdd = sc.createSgcRDD(CloudStorageSgcConnectorFactory, parameters)

Use the RDD as desired


In [ ]:
rdd.partitions.map(_.asInstanceOf[SgcRDDPartition[CloudStorageSlot]].slot)
rdd.take(10).foreach(println)