Deedle Namespace |
Class | Description | |
---|---|---|
Addressing |
An `Address` value is used as an interface between vectors and indices. The index maps
keys of various types to address, which is then used to get a value from the vector.
Here is a brief summary of what we assume (and don't assume) about addresses:
- Address is `int64` (although we might need to generalize this in the future)
- Different data sources can use different addressing schemes
(as long as both index and vector use the same scheme)
- Addresses don't have to be continuous (e.g. if the source is partitioned, it
can use 32bit partition index + 32bit offset in the partition)
- In the in-memory representation, address is just index into an array
- In the BigDeedle representation, address is abstracted and comes with
`AddressOperations` that specifies how to use it (tests use linear
offset and partitioned representation)
[category:Vectors and indices]
| |
Addressingaddress | ||
AddressingAddressModule | ||
AddressingLinearAddress |
Address operations that are used by the standard in-memory Deedle structures
(LinearIndex and ArrayVector). Here, address is a positive array offset.
| |
AddressingLinearAddressingScheme |
Represents a linear addressing scheme where the addresses are `0 .. <size>-1`.
| |
Aggregation |
A non-generic type that simplifies the construction of `Aggregation<K>` values
from C#. It provides methods for constructing different kinds of aggregation
strategies for ordered series.
[category:Parameters and results of various operations]
| |
AggregationK |
Represents a strategy for aggregating data in an ordered series into data segments.
To create a value of this type from C#, use the non-generic `Aggregation` type.
Data can be aggregate using floating windows or chunks of a specified size or
by specifying a condition on two keys (i.e. end a window/chunk when the condition
no longer holds).
[category:Parameters and results of various operations]
| |
AggregationKChunkSize |
Aggregate data into non-overlapping chunks of a specified size
and the provided handling of boundary elements.
| |
AggregationKChunkWhile |
Aggregate data into non-overlapping chunks where each chunk ends as soon
as the specified function returns `false` when called with the
first key and the current key as arguments.
| |
AggregationKTags | ||
AggregationKWindowSize |
Aggregate data into floating windows of a specified size
and the provided handling of boundary elements.
| |
AggregationKWindowWhile |
Aggregate data into floating windows where each window ends as soon
as the specified function returns `false` when called with the
first key and the current key as arguments.
| |
ColumnSeriesTRowKey, TColumnKey |
Represents a series of columns from a frame. The type inherits from a series of
series representing individual columns (`Series<'TColumnKey, ObjectSeries<'TRowKey>>`) but
hides slicing operations with new versions that return frames.
[category:Specialized frame and series types]
| |
DataSegment |
Provides helper functions and active patterns for working with `DataSegment` values
[category:Parameters and results of various operations]
| |
DataSegmentT |
Represents a segment of a series or sequence. The value is returned from
various functions that aggregate data into chunks or floating windows. The
`Complete` case represents complete segment (e.g. of the specified size) and
`Boundary` represents segment at the boundary (e.g. smaller than the required
size).
## Example
For example (using internal `windowed` function):
open Deedle.Internal
Seq.windowedWithBounds 3 Boundary.AtBeginning [ 1; 2; 3; 4 ]
[fsi: [| DataSegment(Incomplete, [| 1 |]) ]
[fsi: DataSegment(Incomplete, [| 1; 2 |]) ]
[fsi: DataSegment(Complete [| 1; 2; 3 |]) ]
[fsi: DataSegment(Complete [| 2; 3; 4 |]) |] ]
If you do not need to distinguish the two cases, you can use the `Data` property
to get the array representing the segment data.
[category:Parameters and results of various operations]
| |
DataSegmentKind |
Represents a kind of `DataSegment<T>`. See that type for more information.
[category:Parameters and results of various operations]
| |
DataSegmentKindTags | ||
DelayedSeries |
This type exposes a single static method `DelayedSeries.Create` that can be used for
constructing data series (of type `Series<K, V>`) with lazily loaded data. You can
use this functionality to create series that represents e.g. an entire price history
in a database, but only loads data that are actually needed. For more information
see the [lazy data loading tutorial](../lazysource.html).
## Example
Assuming we have a function `generate lo hi` that generates data in the specified
`DateTime` range, we can create lazy series as follows:
let ls = DelayedSeries.Create(min, max, fun (lo, lob) (hi, hib) ->
async {
printfn "Query: %A - %A" (lo, lob) (hi, hib)
return generate lo hi })
The arguments `min` and `max` specfify the complete range of the series. The
function passed to `Create` is called with minimal and maximal required key
(`lo` and `hi`) and with two values that specify boundary behaviour.
[category:Specialized frame and series types]
| |
EnumerableExtensions |
Contains C#-friendly extension methods for various instances of `IEnumerable`
that can be used for creating `Series<'K, 'V>` from the `IEnumerable` value.
You can create an ordinal series from `IEnumerable<'T>` or an indexed series from
`IEnumerable<KeyValuePair<'K, 'V>>` or from `IEnumerable<KeyValuePair<'K, OptionalValue<'V>>>`.
[category:Frame and series operations]
| |
F# Frame extensions |
This module contains F# functions and extensions for working with frames. This
includes operations for creating frames such as the `frame` function, `=>` operator
and `Frame.ofRows`, `Frame.ofColumns` and `Frame.ofRowKeys` functions. The module
also provides additional F# extension methods including `ReadCsv`, `SaveCsv` and `PivotTable`.
## Frame construction
The functions and methods in this group can be used to create frames. If you are creating
a frame from a number of sample values, you can use `frame` and the `=>` operator (or the
`=?>` opreator which is useful if you have multiple series of distinct types):
frame [ "Column 1" => series [ 1 => 1.0; 2 => 2.0 ]
"Column 2" => series [ 3 => 3.0 ] ]
Aside from this, the various type extensions let you write `Frame.ofXyz` to construct frames
from data in various formats - `Frame.ofRows` and `Frame.ofColumns` create frame from a series
or a sequence of rows or columns; `Frame.ofRecords` creates a frame from .NET objects using
Reflection and `Frame.ofRowKeys` creates an empty frame with the specified keys.
## Frame operations
The group contains two overloads of the F#-friendly version of the `PivotTable` method.
## Input and output
This group of extensions includes a number of overloads for the `ReadCsv` and `SaveCsv`
methods. The methods here are designed to be used from F# and so they are F#-style extensions
and they use F#-style optional arguments. In general, the overlads take either a path or
`TextReader`/`TextWriter`. Also note that `ReadCsv<'R>(path, indexCol, ...)` lets you specify
the column to be used as the index.
[category:Frame and series operations]
| |
F# Index extensions |
Defines non-generic `Index` type that provides functions for building indices
(hard-bound to `LinearIndexBuilder` type). In F#, the module is automatically opened
using `AutoOpen`. The methods are not designed for the use from C#.
[category:Vectors and indices]
| |
F# Index extensionsIndex |
Type that provides a simple access to creating indices represented
using the built-in `LinearVector` type.
| |
F# IndexBuilder implementation |
Set concrete IIndexBuilder implementation
[category:Vectors and indices]
| |
F# IndexBuilder implementationIndexBuilder | ||
F# Series extensions |
Contains extensions for creating values of type `Series<'K, 'V>` including
a type with functions such as `Series.ofValues` and the `series` function.
The module is automatically opened for all F# code that references `Deedle`.
[category:Core frame and series types]
| |
F# Series extensionsSeries | ||
F# Vector extensions |
Defines non-generic `Vector` type that provides functions for building vectors
(hard-bound to `ArrayVectorBuilder` type). In F#, the module is automatically opened
using `AutoOpen`. The methods are not designed for the use from C#.
[category:Vectors and indices]
| |
F# Vector extensions (core) |
Module with extensions for generic vector type. Given `vec` of type `IVector<T>`,
the extension property `vec.DataSequence` returns all data of the vector converted
to the "least common denominator" data structure - `IEnumerable<T>`.
[category:Vectors and indices]
| |
F# Vector extensionsVector |
Type that provides a simple access to creating vectors represented
using the built-in `ArrayVector` type that stores the data in a
continuous block of memory.
| |
F# VectorBuilder implementation |
Set concrete IVectorBuilder implementation
[category:Vectors and indices]
| |
F# VectorBuilder implementationVectorBuilder | ||
Frame |
Provides static methods for creating frames, reading frame data
from CSV files and database (via IDataReader). The type also provides
global configuration for reflection-based expansion.
[category:Frame and series operations]
| |
FrameTRowKey, TColumnKey |
A frame is the key Deedle data structure (together with series). It represents a
data table (think spreadsheet or CSV file) with multiple rows and columns. The frame
consists of row index, column index and data. The indices are used for efficient
lookup when accessing data by the row key `'TRowKey` or by the column key
`'TColumnKey`. Deedle frames are optimized for the scenario when all values in a given
column are of the same type (but types of different columns can differ).
## Joining, zipping and appending
More info
[category:Core frame and series types]
| |
FrameBuilder |
Type that can be used for creating frames using the C# collection initializer syntax.
You can use `new FrameBuilder.Columns<...>` to create a new frame from columns or you
can use `new FrameBuilder.Rows<...>` to create a new frame from rows.
## Example
The following creates a new frame with columns `Foo` and `Bar`:
var sampleFrame =
new FrameBuilder.Columns<int, string> {
{ "Foo", new SeriesBuilder<int> { {1,11.1}, {2,22.4} }.Series }
{ "Bar", new SeriesBuilder<int> { {1,42.42} }.Series }
}.Frame;
[category:Frame and series operations]
| |
FrameBuilderColumnsR, C | ||
FrameBuilderRowsR, C | ||
FrameData |
Represents the underlying (raw) data of the frame in a format that can
be used for exporting data frame to other formats etc. (DataTable, CSV, Excel)
[category:Core frame and series types]
| |
FrameExtensions |
Contains C# and F# extension methods for the `Frame<'R, 'C>` type. The members are
automatically available when you import the `Deedle` namespace. The type contains
object-oriented counterparts to most of the functionality from the `Frame` module.
## Data structure manipulation
Summary 1
## Input and output
Summary 2
## Missing values
Summary 3
[category:Frame and series operations]
| |
FrameModule |
The `Frame` module provides an F#-friendly API for working with data frames.
The module follows the usual desing for collection-processing in F#, so the
functions work well with the pipelining operator (`|>`). For example, given
a frame with two columns representing prices, we can use `Frame.diff` and
numerical operators to calculate daily returns like this:
let df = frame [ "MSFT" => prices1; "AAPL" => prices2 ]
let past = df |> Frame.diff 1
let rets = past / df * 100.0
rets |> Stats.mean
Note that the `Stats.mean` operation is overloaded and works both on series
(returning a number) and on frames (returning a series).
The functions in this module are designed to be used from F#. For a C#-friendly
API, see the `FrameExtensions` type. For working with individual series, see the
`Series` module. The functions in the `Frame` module are grouped in a number of
categories and documented below.
Accessing frame data and lookup
-------------------------------
Functions in this category provide access to the values in the fame. You can
also add and remove columns from a frame (which both return a new value).
- `addCol`, `replaceCol` and `dropCol` can be used to create a new data frame
with a new column, by replacing an existing column with a new one, or by dropping
an existing column
- `cols` and `rows` return the columns or rows of a frame as a series containing
objects; `getCols` and `getRows` return a generic series and cast the values to
the type inferred from the context (columns or rows of incompatible types are skipped);
`getNumericCols` returns columns of a type convertible to `float` for convenience.
- You can get a specific row or column using `get[Col|Row]` or `lookup[Col|Row]` functions.
The `lookup` variant lets you specify lookup behavior for key matching (e.g. find the
nearest smaller key than the specified value). There are also `[try]get` and `[try]Lookup`
functions that return optional values and functions returning entire observations
(key together with the series).
- `sliceCols` and `sliceRows` return a sub-frame containing only the specified columns
or rows. Finally, `toArray2D` returns the frame data as a 2D array.
Grouping, windowing and chunking
--------------------------------
The basic grouping functions in this category can be used to group the rows of a
data frame by a specified projection or column to create a frame with hierarchical
index such as `Frame<'K1 * 'K2, 'C>`. The functions always aggregate rows, so if you
want to group columns, you need to use `Frame.transpose` first.
The function `groupRowsBy` groups rows by the value of a specified column. Use
`groupRowsBy[Int|Float|String...]` if you want to specify the type of the column in
an easier way than using type inference; `groupRowsUsing` groups rows using the
specified _projection function_ and `groupRowsByIndex` projects the grouping key just
from the row index.
More advanced functions include: `aggregateRowsBy` which groups the rows by a
specified sequence of columns and aggregates each group into a single value;
`pivotTable` implements the pivoting operation [as documented in the
tutorials](../frame.html#pivot).
The `stack` and `unstack` functions turn the data frame into a single data frame
containing columns `Row`, `Column` and `Value` containing the data of the original
frame; `unstack` can be used to turn this representation back into an original frame.
A simple windowing functions that are exposed for an entire frame operations are
`window` and `windowInto`. For more complex windowing operations, you currently have
to use `mapRows` or `mapCols` and apply windowing on individual series.
Sorting and index manipulation
------------------------------
A frame is indexed by row keys and column keys. Both of these indices can be sorted
(by the keys). A frame that is sorted allows a number of additional operations (such
as lookup using the `Lookp.ExactOrSmaller` lookup behavior). The functions in this
category provide ways for manipulating the indices. It is expected that most operations
are done on rows and so more functions are available in a row-wise way. A frame can
alwyas be transposed using `Frame.transpose`.
### Index operations
The existing row/column keys can be replaced by a sequence of new keys using the
`indexColsWith` and `indexRowsWith` functions. Row keys can also be replaced by
ordinal numbers using `indexRowsOrdinally`.
The function `indexRows` uses the specified column of the original frame as the
index. It removes the column from the resulting frame (to avoid this, use overloaded
`IndexRows` method). This function infers the type of row keys from the context, so it
is usually more convenient to use `indexRows[Date|String|Int|...]` functions. Finally,
if you want to calculate the index value based on multiple columns of the row, you
can use `indexRowsUsing`.
### Sorting frame rows
Frame rows can be sorted according to the value of a specified column using the
`sortRows` function; `sortRowsBy` takes a projection function which lets you
transform the value of a column (e.g. to project a part of the value).
The functions `sortRowsByKey` and `sortColsByKey` sort the rows or columns
using the default ordering on the key values. The result is a frame with ordered
index.
### Expanding columns
When the frame contains a series with complex .NET objects such as F# records or
C# classes, it can be useful to "expand" the column. This operation looks at the
type of the objects, gets all properties of the objects (recursively) and
generates multiple series representing the properties as columns.
The function `expandCols` expands the specified columns while `expandAllCols`
applies the expansion to all columns of the data frame.
Frame transformations
---------------------
Functions in this category perform standard transformations on data frames including
projections, filtering, taking some sub-frame of the frame, aggregating values
using scanning and so on.
Projection and filtering functions such as `[map|filter][Cols|Rows]` call the
specified function with the column or row key and an `ObjectSeries<'K>` representing
the column or row. You can use functions ending with `Values` (such as `mapRowValues`)
when you do not require the row key, but only the row series; `mapRowKeys` and
`mapColKeys` can be used to transform the keys.
You can use `reduceValues` to apply a custom reduction to values of columns. Other
aggregations are available in the `Stats` module. You can also get a row with the
greaterst or smallest value of a given column using `[min|max]RowBy`.
The functions `take[Last]` and `skip[Last]` can be used to take a sub-frame of the
original source frame by skipping a specified number of rows. Note that this
does not require an ordered frame and it ignores the index - for index-based lookup
use slicing, such as `df.Rows.[lo .. hi]`, instead.
Finally the `shift` function can be used to obtain a frame with values shifted by
the specified offset. This can be used e.g. to get previous value for each key using
`Frame.shift 1 df`. The `diff` function calculates difference from previous value using
`df - (Frame.shift offs df)`.
Processing frames with exceptions
---------------------------------
The functions in this group can be used to write computations over frames that may fail.
They use the type `tryval<'T>` which is defined as a discriminated union:
type tryval<'T> =
| Success of 'T
| Error of exn
Using `tryval<'T>` as a value in a data frame is not generally recommended, because
the type of values cannot be tracked in the type. For this reason, it is better to use
`tryval<'T>` with individual series. However, `tryValues` and `fillErrorsWith` functions
can be used to get values, or fill failed values inside an entire data frame.
The `tryMapRows` function is more useful. It can be used to write a transformation
that applies a computation (which may fail) to each row of a data frame. The resulting
series is of type `Series<'R, tryval<'T>>` and can be processed using the `Series` module
functions.
Missing values
--------------
This group of functions provides a way of working with missing values in a data frame.
The category provides the following functions that can be used to fill missing values:
* `fillMissingWith` fills missing values with a specified constant
* `fillMissingUsing` calls a specified function for every missing value
* `fillMissing` and variants propagates values from previous/later keys
We use the terms _sparse_ and _dense_ to denote series that contain some missing values
or do not contain any missing values, respectively. The functions `denseCols` and
`denseRows` return a series that contains only dense columns or rows and all sparse
rows or columns are replaced with a missing value. The `dropSparseCols` and `dropSparseRows`
functions drop these missing values and return a frame with no missing values.
Joining, merging and zipping
----------------------------
The simplest way to join two frames is to use the `join` operation which can be used to
perform left, right, outer or inner join of two frames. When the row keys of the frames do
not match exactly, you can use `joinAlign` which takes an additional parameter that specifies
how to find matching key in left/right join (e.g. by taking the nearest smaller available key).
Frames that do not contian overlapping values can be combined using `merge` (when combining
just two frames) or using `mergeAll` (for larger number of frames). Tha latter is optimized
to work well for a large number of data frames.
Finally, frames with overlapping values can be combined using `zip`. It takes a function
that is used to combine the overlapping values. A `zipAlign` function provides a variant
with more flexible row key matching (as in `joinAlign`)
Hierarchical index operations
-----------------------------
A data frame has a hierarchical row index if the row index is formed by a tuple, such as
`Frame<'R1 * 'R2, 'C>`. Frames of this kind are returned, for example, by the grouping
functions such as `Frame.groupRowsBy`. The functions in this category provide ways for
working with data frames that have hierarchical row keys.
The functions `applyLevel` and `reduceLevel` can be used to reduce values according to
one of the levels. The `applyLevel` function takes a reduction of type `Series<'K, 'T> -> 'T`
while `reduceLevel` reduces individual values using a function of type `'T -> 'T -> 'T`.
The functions `nest` and `unnest` can be used to convert between frames with
hierarchical indices (`Frame<'K1 * 'K2, 'C>`) and series of frames that represent
individual groups (`Series<'K1, Frame<'K2, 'C>>`). The `nestBy` function can be
used to perform group by operation and return the result as a series of frems.
[category:Frame and series operations]
| |
FrameUtils |
[omit]
Module with helper functions and operations that are needed by Frame<R, C>, but
are easier to write in a separate type (having them inside generic type
can confuse the type inference in various ways).
| |
Index |
Type that provides access to creating indices (represented as `LinearIndex` values)
[category:Vectors and indices]
| |
KeyValue |
A type with extension method for `KeyValuePair<'K, 'V>` that makes
it possible to create values using just `KeyValue.Create`.
[category:Primitive types and values]
| |
MissingValueException |
Thrown when a value at the specified index does not exist in the data frame or series.
This exception is thrown only when the key is defined, but the value is not available,
in other situations `KeyNotFoundException` is thrown
[category:Primitive types and values]
| |
MultiKeyExtensions |
F#-friendly functions for creating multi-level keys and lookups
[category:Parameters and results of various operations]
| |
ObjectSeriesK |
Represents a series containing boxed values. This type is inherited from `Series<'K, obj>`
and it adds additional operations for accessing values with unboxing. This includes operations
such as `os.GetAs<'T>`, `os.TryGetAs<'T>` and `os.TryAs<'T>` which (attempt to) convert
values to the specified type `'T`.
[category:Specialized frame and series types]
| |
OptionalValue |
Non-generic type that makes it easier to create `OptionalValue<T>` values
from C# by benefiting the type inference for generic method invocations.
[category:Primitive types and values]
| |
OptionalValueExtensions |
Extension methods for working with optional values from C#. These make
it easier to provide default values and convert optional values to
`Nullable` (when the contained value is value type)
[category:Primitive types and values]
| |
OptionalValueModule |
Provides various helper functions for using the `OptionalValue<T>` type from F#
(The functions are similar to those in the standard `Option` module).
[category:Primitive types and values]
| |
Pair |
Module with helper functions for extracting values from hierarchical tuples
[category:Primitive types and values]
| |
RangeRestriction |
Provides additional operations for working with the `RangeRestriction<'TAddress>` type
| |
RangeRestrictionTAddress |
Specifies a sub-range within index that can be accessed via slicing
(see the `GetAddressRange` method). For in-memory data structures, accessing
range via known addresses is typically sufficient, but for virtual Big Deedle
sources, `Start` and `End` let us avoid fully evaluating addresses.
`Custom` range can be used for optimizations.
| |
RangeRestrictionTAddressCustom |
Custom range, which is a sequence of indices, or other representation of it
| |
RangeRestrictionTAddressEnd |
Range referring to the specified number of elements from the end
| |
RangeRestrictionTAddressFixed |
Range specified as a pair of (inclusive) lower and upper addresses
| |
RangeRestrictionTAddressStart |
Range referring to the specified number of elements from the start
| |
RangeRestrictionTAddressTags | ||
RowSeriesTRowKey, TColumnKey |
Represents a series of rows from a frame. The type inherits from a series of
series representing individual rows (`Series<'TRowKey, ObjectSeries<'TColumnKey>>`) but
hides slicing operations with new versions that return frames.
[category:Specialized frame and series types]
| |
SeriesK, V |
The type `Series<K, V>` represents a data series consisting of values `V` indexed by
keys `K`. The keys of a series may or may not be ordered
[category:Core frame and series types]
| |
SeriesBuilderK |
A simple class that inherits from `SeriesBuilder<'K, obj>` and can be
used instead of writing `SeriesBuilder<'K, obj>` with two type arguments.
[category:Specialized frame and series types]
| |
SeriesBuilderK, V |
The type can be used for creating series using mutation. You can add
items using `Add` and get the resulting series using the `Series` property.
## Using from C#
The type supports the C# collection builder pattern:
var s = new SeriesBuilder<string, double>
{ { "A", 1.0 }, { "B", 2.0 }, { "C", 3.0 } }.Series;
The type also supports the `dynamic` operator:
dynamic sb = new SeriesBuilder<string, obj>();
sb.ID = 1;
sb.Value = 3.4;
[category:Specialized frame and series types]
| |
SeriesExtensions |
The type implements C# and F# extension methods for the `Series<'K, 'V>` type.
The members are automatically available when you import the `Deedle` namespace.
The type contains object-oriented counterparts to most of the functionality
from the `Series` module.
[category:Frame and series operations]
| |
SeriesModule |
The `Series` module provides an F#-friendly API for working with data and time series.
The API follows the usual design for collection-processing in F#, so the functions work
well with the pipelining (`|>`) operator. For example, given a series with ages,
we can use `Series.filterValues` to filter outliers and then `Stats.mean` to calculate
the mean:
ages
|> Series.filterValues (fun v -> v > 0.0 && v < 120.0)
|> Stats.mean
The module provides comprehensive set of functions for working with series. The same
API is also exposed using C#-friendly extension methods. In C#, the above snippet could
be written as:
[lang=csharp]
ages
.Where(kvp => kvp.Value > 0.0 && kvp.Value < 120.0)
.Mean()
For more information about similar frame-manipulation functions, see the `Frame` module.
For more information about C#-friendly extensions, see `SeriesExtensions`. The functions
in the `Series` module are grouped in a number of categories and documented below.
Accessing series data and lookup
--------------------------------
Functions in this category provide access to the values in the series.
- The term _observation_ is used for a key value pair in the series.
- When working with a sorted series, it is possible to perform lookup using
keys that are not present in the series - you can specify to search for the
previous or next available value using _lookup behavior_.
- Functions such as `get` and `getAll` have their counterparts `lookup` and
`lookupAll` that let you specify lookup behavior.
- For most of the functions that may fail, there is a `try[Foo]` variant that
returns `None` instead of failing.
- Functions with a name ending with `At` perform lookup based on the absolute
integer offset (and ignore the keys of the series)
Series transformations
----------------------
Functions in this category perform standard transformations on series including
projections, filtering, taking some sub-series of the series, aggregating values
using scanning and so on.
Projection and filtering functions generally skip over missing values, but there
are variants `filterAll` and `mapAll` that let you handle missing values explicitly.
Keys can be transformed using `mapKeys`. When you do not need to consider the keys,
and only care about values, use `filterValues` and `mapValues` (which is also aliased
as the `$` operator).
Series supports standard set of folding functions including `reduce` and `fold` (to
reduce series values into a single value) as well as the `scan[All]` function, which
can be used to fold values of a series into a series of intermeidate folding results.
The functions `take[Last]` and `skip[Last]` can be used to take a sub-series of the
original source series by skipping a specified number of elements. Note that this
does not require an ordered series and it ignores the index - for index-based lookup
use slicing, such as `series.[lo .. hi]`, instead.
Finally the `shift` function can be used to obtain a series with values shifted by
the specified offset. This can be used e.g. to get previous value for each key using
`Series.shift 1 ts`. The `diff` function calculates difference from previous value using
`ts - (Series.shift offs ts)`.
Processing series with exceptions
---------------------------------
The functions in this group can be used to write computations over series that may fail.
They use the type `tryval<'T>` which is defined as a discriminated union:
type tryval<'T> =
| Success of 'T
| Error of exn
The function `tryMap` lets you create `Series<'K, tryval<'T>>` by mapping over values
of an original series. You can then extract values using `tryValues`, which throws
`AggregateException` if there were any errors. Functions `tryErrors` and `trySuccesses`
give series containing only errors and successes. You can fill failed values with
a constant using `fillErrorsWith`.
Hierarchical index operations
-----------------------------
When the key of a series is tuple, the elements of the tuple can be treated
as multiple levels of a index. For example `Series<'K1 * 'K2, 'V>` has two
levels with keys of types `'K1` and `'K2` respectively.
The functions in this cateogry provide a way for aggregating values in the
series at one of the levels. For example, given a series `input` indexed by
two-element tuple, you can calculate mean for different first-level values as
follows:
input |> applyLevel fst Stats.mean
Note that the `Stats` module provides helpers for typical statistical operations,
so the above could be written just as `input |> Stats.levelMean fst`.
Grouping, windowing and chunking
--------------------------------
This category includes functions that group data from a series in some way. Two key
concepts here are _window_ and _chunk_. Window refers to (overlapping) sliding windows
over the input series while chunk refers to non-overlapping blocks of the series.
The boundary behavior can be specified using the `Boundary` flags. The value
`Skip` means that boundaries (incomplete windows or chunks) should be skipped. The value
`AtBeginning` and `AtEnding` can be used to define at which side should the boundary be
returned (or skipped). For chunking, `AtBeginning ||| Skip` makes sense and it means that
the incomplete chunk at the beginning should be skipped (aligning the last chunk with the end).
The behavior may be specified in a number of ways (which is reflected in the name):
- `dist` - using an absolute distance between the keys
- `while` - using a condition on the first and last key
- `size` - by specifying the absolute size of the window/chunk
The functions ending with `Into` take a function to be applied to the window/chunk.
The functions `window`, `windowInto` and `chunk`, `chunkInto` are simplified versions
that take a size. There is also `pairwise` function for sliding window of size two.
Missing values
--------------
This group of functions provides a way of working with missing values in a series.
The `dropMissing` function drops all keys for which there are no values in the series.
The `withMissingFrom` function lets you copy missing values from another series.
The remaining functions provide different mechanism for filling the missing values.
* `fillMissingWith` fills missing values with a specified constant
* `fillMissingUsing` calls a specified function for every missing value
* `fillMissing` and variants propagates values from previous/later keys
Sorting and index manipulation
------------------------------
A series that is sorted by keys allows a number of additional operations (such as lookup
using the `Lookp.ExactOrSmaller` lookup behavior). However, it is also possible to sort
series based on the values - although the functions for manipulation with series do not
guarantee that the order will be preserved.
To sort series by keys, use `sortByKey`. Other sorting functions let you sort the series
using a specified comparer function (`sortWith`), using a projection function (`sortBy`)
and using the default comparison (`sort`).
In addition, you can also replace the keys of a series with other keys using `indexWith`
or with integers using `indexOrdinally`. To pick and reorder series values using to match
a list of keys use `realign`.
Sampling, resampling and advanced lookup
----------------------------------------
Given a (typically) time series sampling or resampling makes it possible to
get time series with representative values at lower or uniform frequency.
We use the following terminology:
- `lookup` and `sample` functions find values at specified key; if a key is not
available, they can look for value associated with the nearest smaller or
the nearest greater key.
- `resample` function aggregate values values into chunks based
on a specified collection of keys (e.g. explicitly provided times), or
based on some relation between keys (e.g. date times having the same date).
- `resampleUniform` is similar to resampling, but we specify keys by
providing functions that generate a uniform sequence of keys (e.g. days),
the operation also fills value for days that have no corresponding
observations in the input sequence.
Joining, merging and zipping
----------------------------
Given two series, there are two ways to combine the values. If the keys in the series
are not overlapping (or you want to throw away values from one or the other series),
then you can use `merge` or `mergeUsing`. To merge more than 2 series efficiently, use
the `mergeAll` function, which has been optimized for large number of series.
If you want to align two series, you can use the _zipping_ operation. This aligns
two series based on their keys and gives you tuples of values. The default behavior
(`zip`) uses outer join and exact matching. For ordered series, you can specify
other forms of key lookups (e.g. find the greatest smaller key) using `zipAlign`.
functions ending with `Into` are generally easier to use as they call a specified
function to turn the tuple (of possibly missing values) into a new value.
For more complicated behaviors, it is often convenient to use joins on frames instead
of working with series. Create two frames with single columns and then use the join
operation. The result will be a frame with two columns (which is easier to use than
series of tuples).
[category:Frame and series operations]
| |
SeriesModuleImplementation |
[omit]
Module that contains an implementation of sampling for `sampleTime` and
`sampleTimeInto`. For technical reasons (`inline`) this is public..
| |
SeriesStatsExtensions |
The type implements C# and F# extension methods that add numerical operations
to Deedle series. With a few exceptions, the methods are only available for
series containing floating-point values, that is `Series<'K, float>`.
[category:Frame and series operations]
| |
Stats |
The `Stats` type contains functions for fast calculation of statistics over
series and frames as well as over a moving and an expanding window in a series.
The resulting series has the same keys as the input series. When there are
no values, or missing values, different functions behave in different ways.
Statistics (e.g. mean) return missing value when any value is missing, while min/max
functions return the minimal/maximal element (skipping over missing values).
## Series statistics
Functions such as `count`, `mean`, `kurt` etc. return the
statistics calculated over all values of a series. The calculation skips
over missing values (or `nan` values), so for example `mean` returns the
average of all _present_ values.
## Frame statistics
The standard functions are exposed as static members and are
overloaded. This means that they can be applied to both `Series<'K, float>` and
to `Frame<'R, 'C>`. When applied to data frame, the functions apply the
statistical calculation to all numerical columns of the frame.
## Moving windows
Moving window means that the window has a fixed size and moves over the series.
In this case, the result of the statisitcs is always attached to the last key
of the window. The function names are prefixed with `moving`.
## Expanding windows
Expanding window means that the window starts as a single-element sized window
and expands as it moves over the series. In this case, statistics is calculated
for all values up to the current key. This means that the result is attached
to the key at the end of the window. The function names are prefixed
with `expanding`.
## Multi-level statistics
For a series with multi-level (hierarchical) index, the
functions prefixed with `level` provide a way to apply statistical operation on
a single level of the index. (For example you can sum values along the `'K1` keys
in a series `Series<'K1 * 'K2, float>` and get `Series<'K1, float>` as the result.)
## Remarks
The windowing functions in the `Stats` type support calculations over a fixed-size
windows specified by the size of the window. If you need more complex windowing
behavior (such as window based on the distance between keys), different handling
of boundary, or chunking (calculation over adjacent chunks), you can use chunking and
windowing functions from the `Series` module such as `Series.windowSizeInto` or
`Series.chunkSizeInto`.
[category:Frame and series operations]
| |
TryValueT |
Represents a value or an exception. This type is used by functions such as
`Series.tryMap` and `Frame.tryMap` to capture the result of a lambda function,
which may be either a value or an exception. The type is a discriminated union,
so it can be processed using F# pattern matching, or using `Value`, `HasValue`
and `Exception` properties
[category:Primitive types and values]
| |
TryValueTError | ||
TryValueTSuccess | ||
TryValueTTags |
Structure | Description | |
---|---|---|
OptionalValueT |
Value type that represents a potentially missing value. This is similar to
`System.Nullable<T>`, but does not restrict the contained value to be a value
type, so it can be used for storing values of any types. When obtained from
`DataFrame<R, C>` or `Series<K, T>`, the `Value` will never be `Double.NaN` or `null`
(but this is not, in general, checked when constructing the value).
The type is only used in C#-friendly API. F# operations generally use expose
standard F# `option<T>` type instead. However, there the `OptionalValue` module
contains helper functions for using this type from F# as well as `Missing` and
`Present` active patterns.
[category:Primitive types and values]
|
Interface | Description | |
---|---|---|
AddressingIAddressingScheme |
An empty interface that is used as an marker for "addressing schemes". As discussed
above, Deedle can use different addressing schemes. We need to make sure that the index
and vector share the scheme - this is done by attaching `IAddressingScheme` to each
index or vector and checking that they match. Implementations must support equality!
| |
AddressingIAddressOperations |
Various implementations can use different schemes for working with addresses
(for example, address can be just a global offset, or it can be pair of `int32` values
that store partition and offset in a partition). This interface represents a specific
address range and abstracts operations that BigDeedle needs to perform on addresses
(within the specified range)
| |
ICustomLookupK |
Represents a special lookup. This can be used to support hierarchical or duplicate keys
in an index. A key type `K` can come with associated `ICustomLookup<K>` to provide
customized pattern matching (equality testing)
[category:Parameters and results of various operations]
| |
IFrame |
An empty interface that is implemented by `Frame<'R, 'C>`. The purpose of the
interface is to allow writing code that works on arbitrary data frames (you
need to provide an implementation of the `IFrameOperation<'V>` which contains
a generic method `Invoke` that will be called with the typed data frame).
[category:Specialized frame and series types]
| |
IFrameOperationV |
Represents an operation that can be invoked on `Frame<'R, 'C>`. The operation
is generic in the type of row and column keys.
[category:Specialized frame and series types]
| |
IRangeRestrictionTAddress |
A sequence of indicies together with the total number. Use `RangeRestriction.ofSeq` to
create one from a sequence. This can be implemented by concrete vector/index
builders to allow further optimizations (e.g. when the underlying source directly
supports range operations).
For example, if your source has an optimised way for getting every 10th address, you
can create your own `IRangeRestriction` and then check for it in `LookupRange` and
use optimised implementation rather than actually iterating over the sequence of indices.
| |
ISeriesK |
Represents an untyped series with keys of type `K` and values of some unknown type
(This type should not generally be used directly, but it can be used when you need
to write code that works on a sequence of series of heterogeneous types).
[category:Core frame and series types]
| |
IVector |
Represents an (untyped) vector that stores some values and provides access
to the values via a generic address. This type should be only used directly when
extending the DataFrame library and adding a new way of storing or loading data.
To allow invocation via Reflection, the vector exposes type of elements as `System.Type`.
[category:Vectors and indices]
| |
IVectorT |
A generic, typed vector. Represents mapping from addresses to values of type `T`.
The vector provides a minimal interface that is required by series and can be
implemented in a number of ways to provide vector backed by database or an
alternative representation of data.
[category:Vectors and indices]
| |
IVectorLocation |
Represents a location in a vector. In general, we always know the address, but
sometimes (BigDeedle) it is hard to get the offset (requires some data lookups),
so we use this interface to delay the calculation of the Offset (which is mainly
needed in one of the `series.Select` overloads)
[category:Vectors and indices]
| |
VectorCallSiteR |
Represents a generic function `\forall.'T.(IVector<'T> -> 'R)`. The function can be
generically invoked on an argument of type `IVector` using `IVector.Invoke`
[category:Vectors and indices]
|
Enumeration | Description | |
---|---|---|
Boundary |
Represents boundary behaviour for operations such as floating window. The type
specifies whether incomplete windows (of smaller than required length) should be
produced at the beginning (`AtBeginning`) or at the end (`AtEnding`) or
skipped (`Skip`). For chunking, combinations are allowed too - to skip incomplete
chunk at the beginning, use `Boundary.Skip ||| Boundary.AtBeginning`.
[category:Parameters and results of various operations]
| |
ConversionKind |
Represents different kinds of type conversions that can be used by Deedle internally.
This is used, for example, when converting `ObjectSeries<'K>` to `Series<'K, 'T>` -
The conversion kind can be specified as an argument to allow certain conversions.
[category:Parameters and results of various operations]
| |
Direction |
Specifies in which direction should we look when performing operations such as
`Series.Pairwise`.
## Example
let abc =
[ 1 => "a"; 2 => "b"; 3 => "c" ]
|> Series.ofObservations
// Using 'Forward' the key of the first element is used
abc.Pairwise(direction=Direction.Forward)
[fsi:[ 1 => ("a", "b"); 2 => ("b", "c") ]]
// Using 'Backward' the key of the second element is used
abc.Pairwise(direction=Direction.Backward)
[fsi:[ 2 => ("a", "b"); 3 => ("b", "c") ]]
[category:Parameters and results of various operations]
| |
JoinKind |
This enumeration specifies joining behavior for `Join` method provided
by `Series` and `Frame`. Outer join unions the keys (and may introduce
missing values), inner join takes the intersection of keys; left and
right joins take the keys of the first or the second series/frame.
[category:Parameters and results of various operations]
| |
Lookup |
Represents different behaviors of key lookup in series. For unordered series,
the only available option is `Lookup.Exact` which finds the exact key - methods
fail or return missing value if the key is not available in the index. For ordered
series `Lookup.Greater` finds the first greater key (e.g. later date) with
a value. `Lookup.Smaller` searches for the first smaller key. The options
`Lookup.ExactOrGreater` and `Lookup.ExactOrSmaller` finds the exact key (if it is
present) and otherwise search for the nearest larger or smaller key, respectively.
[category:Parameters and results of various operations]
| |
UnionBehavior |
This enumeration specifies the behavior of `Union` operation on series when there are
overlapping keys in two series that are being unioned. The options include preferring values
from the left/right series or throwing an exception when both values are available.
[category:Parameters and results of various operations]
|