spark-api

Spark API

Codacy Badge Build Status

Introduction

I use Apache Spark a lot in my daily work but also on some personal experiments.

I build this library for people who use Akka (like myself) and would like to have an easy way to interact with Spark to submit, monitor and kill Spark Jobs without having to deploy a web server and interact with a REST API.

Hope this serves you well. Contributions are more than welcome.

Instalation

To use just add to your build.sbt

For Scala 2.11.x

resolvers += Resolver.jcenterRepo

libraryDependencies+= "xyz.joaovasques" %% "spark-api" % "0.2"

For Scala 2.12.x

resolvers += Resolver.jcenterRepo

libraryDependencies+= "xyz.joaovasques" %% "spark-api" % "0.3"

Features

The main purpose of this library is to simplify the management of Spark Jobs. Currently, only Standalone Cluster mode is supported. The current version of this library supports the following operations

This library is built on top of Akka and uses Akka-Http to communicate with Spark’s master.

Usage

Spark Actor

The SparkApi object is the main point of the this library. In order to start sending commands to the cluster you need to create a Spark Actor.

def getStandaloneGateway(sparkMaster: String)(implicit system: ActorSystem): ActorRef

Submit a Job

In order to submit a job to Spark you need to send a SubmitJob message to the spark actor.


case class SubmitJob(name: String, mainClass: String, arguments: Set[String], jarLocation: String, envVars: EnvVars) extends SparkRequest

The following code snipet demonstrates how to submit a job to spark

import xyz.joaovasques.sparkapi.messages._
import xyz.joaovasques.sparkapi._

val sparkActor = ....

val request = SubmitJob("TestJob", "com.example.test", Set("--env", "test"), "s3n://...", Map())

(sparkActor ? request).mapTo[SparkApiResponse]

The actor returns a Ok(driverId: String) if successful and an exception otherwise.

Check Job Status

In order to check the status of a job you need to send a JobStatus message to the spark actor.

case class JobStatus(driverId: String) extends SparkRequest

The following code snipet demonstrates how to check a job status

import xyz.joaovasques.sparkapi.messages._
import xyz.joaovasques.sparkapi._

val sparkActor = ....

val request = JobStatus("driverid-20170302629")

(sparkActor ? request).mapTo[String]

Kill a Job

In order to check the status of a job you need to send a JobStatus message to the spark actor.

case class KillJob(driverId: String) extends SparkRequest

The following code snipet demonstrates how to kill a Spark job.

import xyz.joaovasques.sparkapi.messages._
import xyz.joaovasques.sparkapi._

val sparkActor = ....

val request = KillJob("driverid-20170302629")

(sparkActor ? request).mapTo[SparkApiResponse]

Future work

Some future work might include:

  1. Support for a job watcher that tracks the health of a running job;
  2. Improve Spark Actor Message Protocol (some additional types can be added)
  3. Expose Spark internal job metrics
  4. Add support for other cluster management systems (e.g. Mesos)
  5. Integrate with Akka Clustering?!
  6. Whatever YOU like :)