Table of Contents
SparkTest.NET

SparkTest.NET


Nuget Coverage Quality Gate Status CD Build Check Markdown CodeQL

Support for testing Spark .NET applications

Important

Due to inactivity and lack of support for spark.NET this has been archived. I would recommend building Spark applications in a supported language, not in dotnet.


Why?

There is no documented/supported way to write tests for Spark .NET applications.

There are a number of foot guns and this aims to disarm them.

  • spark-debug needs to run from the executing location of the tests assembly, this starts spark-debug in the correct location, and ensures that it is stopped after all the tests are complete
    • If spark-debug is already running, it will not be started, good for CI
  • spark-debug without any tuning is not optimized for short running tests, this,
    • Sets the log level to Error (default is super chatty)
    • Disables the Spark UI settings
    • Stops Spark from spreading out and localizes work
    • Disables shuffling for Spark SQL
    • Enables all cores on the spark submit job
  • SparkSession is not thread safe, its backed by a single SparkContext, this enforces that all operations that need a SparkSession run sequentially

How?

The SparkSessionFactory.UseSession function will ensure that a spark-debug process is running before returning a session that will then have exclusive access to Spark to run the user provided operation.

The following details the sequence of operations,

sequenceDiagram
    participant t as User Code
    participant a as UseSession
    participant b as Spark Debug
    t ->> a: Run code with SparkSession
    note over t, a: This is a singleton operation under a lock
    a ->> b: Create SparkSession
    alt Spark Debug not running
        b -->> a: No response
        a ->> b: Start spark-debug
    end
    b -->> a: Session is active and ready to use
    a -->> t: Run user code against SparkSession returning result