Data types
Generic
Package datamaestro.data
- XPM Configdatamaestro.data.Base(*, id)
Submit type:
datamaestro.data.Base
Base object for all data types
- id: str
The unique dataset ID
- XPM Configdatamaestro.data.Generic(*, id)
Submit type:
datamaestro.data.Generic
Generic dataset
This allows to set any value, but should only be used as a placeholder
- id: str
The unique dataset ID
- XPM Configdatamaestro.data.File(*, id, path)
Submit type:
datamaestro.data.File
A data file
- id: str
The unique dataset ID
- path: Path
The path of the file
CSV data
Package datamaestro.data.csv
- XPM Configdatamaestro.data.csv.Generic(*, id, path, delimiter, ignore, names_row)
Submit type:
datamaestro.data.csv.Generic
A generic CSV file
- id: str
The unique dataset ID
- path: Path
The path of the file
- delimiter: str = ,
- ignore: int = 0
- names_row: int = -1
- XPM Configdatamaestro.data.csv.Matrix(*, id, path, delimiter, ignore, names_row, target, size_row)
Submit type:
datamaestro.data.csv.Matrix
A numerical dataset
- id: str
The unique dataset ID
- path: Path
The path of the file
- delimiter: str = ,
- ignore: int = 0
- names_row: int = -1
- target: str
- size_row: int = -1
Machine Learning
Package datamaestro.data.ml
- XPM Configdatamaestro.data.ml.Supervised(*, id, train, validation, test)
Submit type:
datamaestro.data.ml.Supervised
- id: str
The unique dataset ID
- train: datamaestro.data.Base
The training dataset
- validation: datamaestro.data.Base
The validation dataset (optional)
- test: datamaestro.data.Base
The training optional
- XPM Configdatamaestro.data.ml.FolderBased(*, id, classes, path)
Submit type:
datamaestro.data.ml.FolderBased
Classification dataset where folders give the basis
- id: str
The unique dataset ID
- classes: any
- path: Path
Tensor
Package datamaestro.data.tensor
- XPM Configdatamaestro.data.tensor.IDX(*, id, path)
Submit type:
datamaestro.data.tensor.IDX
IDX File format
The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is:
magic number size in dimension 0 size in dimension 1 size in dimension 2 ….. size in dimension N data
The magic number is an integer (MSB first). The first 2 bytes are always 0.
The third byte codes the type of the data: 0x08: unsigned byte 0x09: signed byte 0x0B: short (2 bytes) 0x0C: int (4 bytes) 0x0D: float (4 bytes) 0x0E: double (8 bytes)
The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices….
The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).
The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.
- id: str
The unique dataset ID
- path: Path
The path of the file