Data types

Generic

Package datamaestro.data

XPM Configdatamaestro.data.Base(*, id)

Submit type: datamaestro.data.Base

Base object for all data types

id: str: The unique dataset ID

XPM Configdatamaestro.data.Generic(*, id)

Submit type: datamaestro.data.Generic

Generic dataset

This allows to set any value, but should only be used as a placeholder

id: str: The unique dataset ID

XPM Configdatamaestro.data.File(*, id, path)

Submit type: datamaestro.data.File

A data file

id: str: The unique dataset ID

path: Path: The path of the file

CSV data

Package datamaestro.data.csv

XPM Configdatamaestro.data.csv.Generic(*, id, path, delimiter, ignore, names_row)

Submit type: datamaestro.data.csv.Generic

A generic CSV file

id: str: The unique dataset ID

path: Path: The path of the file

delimiter: str = ,

ignore: int = 0

names_row: int = -1

XPM Configdatamaestro.data.csv.Matrix(*, id, path, delimiter, ignore, names_row, target, size_row)

Submit type: datamaestro.data.csv.Matrix

A numerical dataset

id: str: The unique dataset ID

path: Path: The path of the file

delimiter: str = ,

ignore: int = 0

names_row: int = -1

target: str

size_row: int = -1

Machine Learning

Package datamaestro.data.ml

XPM Configdatamaestro.data.ml.Supervised(*, id, test, validation, train)

Submit type: datamaestro.data.ml.Supervised

id: str: The unique dataset ID

test: datamaestro.data.Base: The test dataset

validation: datamaestro.data.Base: The validation dataset

train: datamaestro.data.Base: The training dataset

XPM Configdatamaestro.data.ml.FolderBased(*, id, classes, path)

Submit type: datamaestro.data.ml.FolderBased

Classification dataset where folders give the basis

id: str: The unique dataset ID

classes: any

path: Path

Tensor

Package datamaestro.data.tensor

XPM Configdatamaestro.data.tensor.IDX(*, id, path)

Submit type: datamaestro.data.tensor.IDX

IDX File format

The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.

The basic format is:

magic number size in dimension 0 size in dimension 1 size in dimension 2 ….. size in dimension N data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data: 0x08: unsigned byte 0x09: signed byte 0x0B: short (2 bytes) 0x0C: int (4 bytes) 0x0D: float (4 bytes) 0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices….

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.

id: str: The unique dataset ID

path: Path: The path of the file