Test DataFrames
To build a test DataFrame do the following,
Import the extensions,
using SparkTest.NET.Extensions;
Simple Usage
The CreateDataFrameFromData creates a DataFrame from test data
For example a simple single column numeric data set,
var df = s.CreateDataFrameFromData(
new { Id = 1 },
Enumerable.Range(2, 9).Select(i => new { Id = i }).ToArray()
);
Primitive types
All the primitive .NET types that are supported by Spark can be used,
var df = s.CreateDataFrameFromData(
new
{
Byte = (byte)1,
Short = (short)1,
Int = 1,
Long = 1L,
Float = 1.0F,
Double = 1.0,
Decimal = (decimal)1.0,
String = "a",
Char = 'a',
Bool = true,
Date = DateTime.MinValue,
SparkDate = new Date(DateTime.MinValue),
DateTimeOffset = DateTimeOffset.MinValue,
Timestamp = new Timestamp(DateTime.MinValue),
Binary = new byte[] { 1 },
Enum = LogLevel.None
},
new
{
Byte = (byte)2,
Short = (short)2,
Int = 2,
Long = 2L,
Float = 2.0F,
Double = 2.0,
Decimal = (decimal)2.0,
String = "b",
Char = 'b',
Bool = false,
Date = DateTime.MaxValue,
SparkDate = new Date(DateTime.MaxValue),
DateTimeOffset = DateTimeOffset.MaxValue,
Timestamp = new Timestamp(DateTimeOffset.MaxValue.UtcDateTime),
Binary = new byte[] { 1 },
Enum = LogLevel.Critical
}
);
Byte | Short | Int | Long | Float | Double | Decimal | String | Char | Bool | Date | SparkDate | DateTimeOffset | Timestamp | Binary | Enum |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | a | a | true | 0001-01-01 | 0001-01-01 | 0001-01-01 00:00:00 | 0001-01-01 00:00:00 | [01] | None |
2 | 2 | 2 | 2 | 2.0 | 2.0 | 2 | b | b | false | 9999-12-31 | 9999-12-31 | 9999-12-31 23:59:59.999999 | 9999-12-31 23:59:59.999999 | [01] | Critical |
(top = 20)
Collections (Spark Array)
Anything that extends IEnumerable<T> can be used to create Spark Array data,
var df = s.CreateDataFrameFromData(
new { Array = new[] { 1, 2, 3 } },
new { Array = new[] { 4, 5, 6 } }
);
Dictionary (Spark Map)
Anything that extends IEnumerable<T> of KeyValuePair can be used to create Spark Map data,
var df = s.CreateDataFrameFromData(
new
{
Map = new Dictionary<string, int>
{
["A"] = 1,
["B"] = 2,
["C"] = 3
}
},
new
{
Map = new Dictionary<string, int>
{
["D"] = 4,
["E"] = 5,
["F"] = 6
}
}
);
Classes (Spark Struct)
Any POCO can be used to create Spark Struct data,
var df = s.CreateDataFrameFromData(
new
{
CreatedDate = DateTimeOffset.MinValue,
Person = new
{
Name = "Billy",
Age = 12,
Interests = new[]
{
new { Priority = 1, Name = "Football" },
new { Priority = 3, Name = "Cars" }
}
}
},
new
{
CreatedDate = DateTimeOffset.MinValue,
Person = new
{
Name = "James",
Age = 11,
Interests = new[]
{
new { Priority = 1, Name = "Video Games" },
new { Priority = 2, Name = "TV" }
}
}
}
);
CreatedDate | Person |
---|---|
0001-01-01 00:00:00 | {Billy, 12, [{1, Football}, {3, Cars}]} |
0001-01-01 00:00:00 | {James, 11, [{1, Video Games}, {2, TV}]} |
(top = 20)
Empty Frame
You can also create an empty DataFrame that can be used to test columns out,
var df = s.CreateEmptyFrame();