SparkTest.NET
Support for testing Spark .NET applications
Important
Due to inactivity and lack of support for spark.NET this has been archived. I would recommend building Spark applications in a supported language, not in dotnet.
Why?
There is no documented/supported way to write tests for Spark .NET applications.
There are a number of foot guns and this aims to disarm them.
- spark-debug needs to run from the executing location of the tests assembly,
this starts spark-debug in the correct location, and ensures that it is
stopped after all the tests are complete
- If spark-debug is already running, it will not be started, good for CI
 
 - spark-debug without any tuning is not optimized for short running tests, this,
- Sets the log level to Error (default is super chatty)
 - Disables the Spark UI settings
 - Stops Spark from spreading out and localizes work
 - Disables shuffling for Spark SQL
 - Enables all cores on the spark submit job
 
 - SparkSession is not thread safe, its backed by a single SparkContext, this enforces that all operations that need a SparkSession run sequentially
 
How?
The
SparkSessionFactory.UseSession
function will ensure that a spark-debug process is running before returning a
session that will then have exclusive access to Spark to run the user provided
operation.
The following details the sequence of operations,
sequenceDiagram
    participant t as User Code
    participant a as UseSession
    participant b as Spark Debug
    t ->> a: Run code with SparkSession
    note over t, a: This is a singleton operation under a lock
    a ->> b: Create SparkSession
    alt Spark Debug not running
        b -->> a: No response
        a ->> b: Start spark-debug
    end
    b -->> a: Session is active and ready to use
    a -->> t: Run user code against SparkSession returning result