
SparkTest.NET
Support for testing Spark .NET applications
Important
Due to inactivity and lack of support for spark.NET this has been archived. I would recommend building Spark applications in a supported language, not in dotnet.
Why?
There is no documented/supported way to write tests for Spark .NET applications.
There are a number of foot guns and this aims to disarm them.
- spark-debug needs to run from the executing location of the tests assembly,
this starts spark-debug in the correct location, and ensures that it is
stopped after all the tests are complete
- If spark-debug is already running, it will not be started, good for CI
- spark-debug without any tuning is not optimized for short running tests, this,
- Sets the log level to Error (default is super chatty)
- Disables the Spark UI settings
- Stops Spark from spreading out and localizes work
- Disables shuffling for Spark SQL
- Enables all cores on the spark submit job
- SparkSession is not thread safe, its backed by a single SparkContext, this enforces that all operations that need a SparkSession run sequentially
How?
The
SparkSessionFactory.UseSession
function will ensure that a spark-debug process is running before returning a
session that will then have exclusive access to Spark to run the user provided
operation.
The following details the sequence of operations,
sequenceDiagram
participant t as User Code
participant a as UseSession
participant b as Spark Debug
t ->> a: Run code with SparkSession
note over t, a: This is a singleton operation under a lock
a ->> b: Create SparkSession
alt Spark Debug not running
b -->> a: No response
a ->> b: Start spark-debug
end
b -->> a: Session is active and ready to use
a -->> t: Run user code against SparkSession returning result