Records

Records are flexible ways to compose information coming from various sources. For instance, your processing chain can produce records only containing an ID. Later, you add can retrieve the item content and add it to the record. Further in the processing, you would want to add some transformation of the item content.

Records allow to perform this type of transformations by holding a set of items. Record types form a lattice of types so that checking that some item types are present in an item is easy.

Working with record types

Record types form a lattice of types that can be used to check record properties before hand.

ABRecord = record_type(AItem, BItem)
AB1Record = record_type(AItem, B1Item)

# Hierarchy-based check
assert ABRecord.contains(AB1Record)

# Checks for specific types
assert ABRecord.has(AItem, BItem)

Validating

To ensure that a record fills the requested property, one can use record types

ABRecord = record_type(AItem, BItem)

# OK
ABRecord(AItem(1), BItem(2))

# Fails: A1Item is not AItem
ABRecord(A1Item(1), BItem(2))

# Fails: AItem is not present
ABRecord(BItem(2))

When updating, it is also possible to validate

A1BRecord = record_type(A1Item, BItem)
record = Record(AItem(1), BItem(2))

# Update the ABRecord into a A1/B one
record.update(A1Item(1, 2), target=A1BRecord)

API

class datamaestro.record.Item

Base class for all item types

class datamaestro.record.RecordType(*item_types: Type[T])
__call__(*items: T)

Call self as a function.

sub(*item_types: Type[T])

Returns a new record type based on self and new item types

validate(record: Record)

Creates and validate a new record of this type

class datamaestro.record.Record(*items: Dict[Type[T], T] | T, override=False)

Associate types with entries

A record is a composition of items; each item base class is unique.

get(key: Type[T]) T | None

Get a given item or None if it does not exist

has(key: Type[T]) bool

Returns True if the record has the given item type

update(*items: T, target: RecordType | None = None) Record

Update some items

datamaestro.record.record_type(*item_types: Type[T])

Returns a new record type