Table of Contents

Test DataFrames

To build a test DataFrame do the following,

Import the extensions,

using SparkTest.NET.Extensions;

Simple Usage

The CreateDataFrameFromData creates a DataFrame from test data

For example a simple single column numeric data set,


var df = s.CreateDataFrameFromData(
    new { Id = 1 },
    Enumerable.Range(2, 9).Select(i => new { Id = i }).ToArray()
);

Id
1
2
3
4
5
6
7
8
9
10

(top = 20)

Primitive types

All the primitive .NET types that are supported by Spark can be used,


var df = s.CreateDataFrameFromData(
    new
    {
        Byte = (byte)1,
        Short = (short)1,
        Int = 1,
        Long = 1L,
        Float = 1.0F,
        Double = 1.0,
        Decimal = (decimal)1.0,
        String = "a",
        Char = 'a',
        Bool = true,
        Date = DateTime.MinValue,
        SparkDate = new Date(DateTime.MinValue),
        DateTimeOffset = DateTimeOffset.MinValue,
        Timestamp = new Timestamp(DateTime.MinValue),
        Binary = new byte[] { 1 },
        Enum = LogLevel.None
    },
    new
    {
        Byte = (byte)2,
        Short = (short)2,
        Int = 2,
        Long = 2L,
        Float = 2.0F,
        Double = 2.0,
        Decimal = (decimal)2.0,
        String = "b",
        Char = 'b',
        Bool = false,
        Date = DateTime.MaxValue,
        SparkDate = new Date(DateTime.MaxValue),
        DateTimeOffset = DateTimeOffset.MaxValue,
        Timestamp = new Timestamp(DateTimeOffset.MaxValue.UtcDateTime),
        Binary = new byte[] { 1 },
        Enum = LogLevel.Critical
    }
);

Byte Short Int Long Float Double Decimal String Char Bool Date SparkDate DateTimeOffset Timestamp Binary Enum
1 1 1 1 1.0 1.0 1 a a true 0001-01-01 0001-01-01 0001-01-01 00:00:00 0001-01-01 00:00:00 [01] None
2 2 2 2 2.0 2.0 2 b b false 9999-12-31 9999-12-31 9999-12-31 23:59:59.999999 9999-12-31 23:59:59.999999 [01] Critical

(top = 20)

Collections (Spark Array)

Anything that extends IEnumerable<T> can be used to create Spark Array data,


var df = s.CreateDataFrameFromData(
    new { Array = new[] { 1, 2, 3 } },
    new { Array = new[] { 4, 5, 6 } }
);

Array
[1, 2, 3]
[4, 5, 6]

(top = 20)

Dictionary (Spark Map)

Anything that extends IEnumerable<T> of KeyValuePair can be used to create Spark Map data,


var df = s.CreateDataFrameFromData(
    new
    {
        Map = new Dictionary<string, int>
        {
            ["A"] = 1,
            ["B"] = 2,
            ["C"] = 3
        }
    },
    new
    {
        Map = new Dictionary<string, int>
        {
            ["D"] = 4,
            ["E"] = 5,
            ["F"] = 6
        }
    }
);

Map
{A -> 1, B -> 2, C -> 3}
{D -> 4, E -> 5, F -> 6}

(top = 20)

Classes (Spark Struct)

Any POCO can be used to create Spark Struct data,


var df = s.CreateDataFrameFromData(
    new
    {
        CreatedDate = DateTimeOffset.MinValue,
        Person = new
        {
            Name = "Billy",
            Age = 12,
            Interests = new[]
            {
                new { Priority = 1, Name = "Football" },
                new { Priority = 3, Name = "Cars" }
            }
        }
    },
    new
    {
        CreatedDate = DateTimeOffset.MinValue,
        Person = new
        {
            Name = "James",
            Age = 11,
            Interests = new[]
            {
                new { Priority = 1, Name = "Video Games" },
                new { Priority = 2, Name = "TV" }
            }
        }
    }
);

CreatedDate Person
0001-01-01 00:00:00 {Billy, 12, [{1, Football}, {3, Cars}]}
0001-01-01 00:00:00 {James, 11, [{1, Video Games}, {2, TV}]}

(top = 20)

Empty Frame

You can also create an empty DataFrame that can be used to test columns out,


var df = s.CreateEmptyFrame();