I use Apache Spark a lot in my daily work but also on some personal experiments.
I build this library for people who use Akka (like myself) and would like to have an easy way to interact with Spark to submit, monitor and kill Spark Jobs without having to deploy a web server and interact with a REST API.
Hope this serves you well. Contributions are more than welcome.
To use just add to your build.sbt
For Scala 2.11.x
resolvers += Resolver.jcenterRepo
libraryDependencies+= "xyz.joaovasques" %% "spark-api" % "0.2"
For Scala 2.12.x
resolvers += Resolver.jcenterRepo
libraryDependencies+= "xyz.joaovasques" %% "spark-api" % "0.3"
The main purpose of this library is to simplify the management of Spark Jobs. Currently, only Standalone Cluster mode is supported. The current version of this library supports the following operations
This library is built on top of Akka and uses Akka-Http to communicate with Spark’s master.
The SparkApi
object is the main point of the this library. In order to start sending commands to the cluster you need to create a Spark Actor.
def getStandaloneGateway(sparkMaster: String)(implicit system: ActorSystem): ActorRef
In order to submit a job to Spark you need to send a SubmitJob
message to the spark actor.
case class SubmitJob(name: String, mainClass: String, arguments: Set[String], jarLocation: String, envVars: EnvVars) extends SparkRequest
The following code snipet demonstrates how to submit a job to spark
import xyz.joaovasques.sparkapi.messages._
import xyz.joaovasques.sparkapi._
val sparkActor = ....
val request = SubmitJob("TestJob", "com.example.test", Set("--env", "test"), "s3n://...", Map())
(sparkActor ? request).mapTo[SparkApiResponse]
The actor returns a Ok(driverId: String)
if successful and an exception otherwise.
In order to check the status of a job you need to send a JobStatus
message to the spark actor.
case class JobStatus(driverId: String) extends SparkRequest
The following code snipet demonstrates how to check a job status
import xyz.joaovasques.sparkapi.messages._
import xyz.joaovasques.sparkapi._
val sparkActor = ....
val request = JobStatus("driverid-20170302629")
(sparkActor ? request).mapTo[String]
In order to check the status of a job you need to send a JobStatus
message to the spark actor.
case class KillJob(driverId: String) extends SparkRequest
The following code snipet demonstrates how to kill a Spark job.
import xyz.joaovasques.sparkapi.messages._
import xyz.joaovasques.sparkapi._
val sparkActor = ....
val request = KillJob("driverid-20170302629")
(sparkActor ? request).mapTo[SparkApiResponse]
Some future work might include: