Command Line Interface
Datamaestro provides a command line interface for searching, downloading, and managing datasets.
Global Options
datamaestro [OPTIONS] COMMAND [ARGS]...
Option |
Description |
|---|---|
|
Suppress informational messages |
|
Enable debug logging |
|
Show full traceback on errors |
|
Set the data directory (default: |
|
Keep downloaded archive files after extraction |
|
Remote hostname for distributed operations |
|
Python path on remote host (default: |
Commands
search
Search for datasets matching given criteria.
datamaestro search [SEARCHTERMS]...
Search Syntax:
Prefix |
Description |
Example |
|---|---|---|
(none) |
Match dataset ID (regex) |
|
|
Match tags (regex) |
|
|
Match tasks (regex) |
|
|
Match repository ID (regex) |
|
|
Match data type identifier |
|
Multiple terms are combined with AND logic.
Examples:
# Find all MNIST-related datasets
datamaestro search mnist
# Find classification datasets
datamaestro search tag:classification
# Find image classification datasets in image repository
datamaestro search repo:image task:classification
# Find datasets with specific type
datamaestro search type:datamaestro.data.ml.Supervised
info
Display detailed information about a dataset.
datamaestro info DATASET
Example:
$ datamaestro info com.lecun.mnist
com.lecun.mnist
http://yann.lecun.com/exdb/mnist/
Types (ids): datamaestro_image.data.ImageClassification
Types (class): datamaestro_image.data.ImageClassification
Tags: benchmark, classification
Tasks: image-classification
The MNIST database of handwritten digits...
download
Download dataset resources without preparing the data structure.
datamaestro download DATASET
Example:
datamaestro download com.lecun.mnist
prepare
Download and prepare a dataset, returning JSON with paths and metadata.
datamaestro prepare [OPTIONS] DATASETID
Option |
Description |
|---|---|
|
Output format (default: |
|
Skip downloading, use existing files |
Example:
$ datamaestro prepare com.lecun.mnist
{
"train": {
"images": {"path": "/home/user/datamaestro/data/..."},
"labels": {"path": "/home/user/datamaestro/data/..."}
},
...
}
repositories
List all available dataset repositories.
datamaestro repositories
Example:
$ datamaestro repositories
image: Image datasets
text: NLP and information retrieval datasets
ml: Machine learning datasets
version
Display the datamaestro version.
datamaestro version
orphans
List orphan directories (downloaded data not associated with any dataset).
datamaestro orphans [OPTIONS]
Option |
Description |
|---|---|
|
Show disk usage for each orphan |
Useful for cleaning up disk space after dataset definitions change.
create-dataset
Create a new dataset definition file from a template.
datamaestro create-dataset REPOSITORY_ID DATASET_ID
Arguments:
REPOSITORY_ID: Target repository (e.g.,image,text)DATASET_ID: Dataset identifier (e.g.,com.example.mydataset) or URL
Example:
# Create from qualified ID
datamaestro create-dataset image com.example.mydataset
# Create from URL (ID is derived automatically)
datamaestro create-dataset text http://example.com/datasets/mydata
datafolders
Manage external data folders for datasets that reference pre-existing data.
datafolders list
List configured data folders.
datamaestro datafolders list
datafolders set
Set an external data folder path.
datamaestro datafolders set KEY PATH
Example:
# Configure a folder for large datasets
datamaestro datafolders set large_data /mnt/storage/datasets
# List configured folders
datamaestro datafolders list
large_data /mnt/storage/datasets
Environment Variables
Variable |
Description |
Default |
|---|---|---|
|
Base directory for data storage |
|
Exit Codes
Code |
Description |
|---|---|
0 |
Success |
1 |
Error (download failed, dataset not found, etc.) |