Scala Example for FTP server

This is an example of how to use Generic Connector with Apache Spark in order to process files stored in a FTP server directory.

Load dependencies for Apache Spark 2.x


In [ ]:
%dep

z.load("alvsanand:spark-generic-connector:0.2.0-spark_2x-s_2.11")

Import dependencies


In [ ]:
import es.alvsanand.sgc.ftp.{FTPCredentials, FTPSlot, ProxyConfiguration}
import es.alvsanand.sgc.ftp.normal.{FTPSgcConnectorFactory, FTPParameters}
import org.apache.spark.streaming.sgc._

Create the SgcConnectorParameters with the desired parameters

  • Direct connection:

In [ ]:
val parameters = FTPParameters("HOST", PORT, "DIRECTORY", FTPCredentials("USER", Option("PASSWORD"))
  • Using active mode:

In [ ]:
val parameters = FTPParameters("HOST", PORT, "DIRECTORY", FTPCredentials("USER", Option("PASSWORD"), activeMode = true)
  • Connection through proxy server:

In [ ]:
val parameters = FTPParameters("HOST", PORT, "DIRECTORY", FTPCredentials("USER", Option("PASSWORD")),
                        proxy = Option(ProxyConfiguration("PROXYHOST", 80, Option("PROXY_USER"),
                        Option("PROXY_PASSWORD"))

Create the RDD passing the SgcConnectorFactory and the parameters


In [ ]:
val rdd = sc.createSgcRDD(FTPSgcConnectorFactory, parameters)

Use the RDD as desired


In [ ]:
rdd.partitions.map(_.asInstanceOf[SgcRDDPartition[FTPSlot]].slot)
rdd.take(10).foreach(println)