Docs

Train from registry snapshots and publish to Hugging Face.

The workflow mirrors the repo structure: import or add tools into `registry/`, include a valid `parent_id`, build the generated snapshot, train from those artifacts, sync local release metadata, then publish downloadable checkpoints to the OpenToolEmbeddings Hugging Face organization.

1. Import

Materialize the OSS baseline

Convert the current OSS tool JSON and synthetic examples into `registry/tools/*` manifests that the site and builder can read.

python3 scripts/import_oss_registry.py

2. Build

Compile the registry

Validate manifests and write `tools.json`, `models.json`, `hierarchy.json`, and the flat JSONL dataset used for training and the site.

python3 scripts/build_registry.py

3. Easy Path

Train every variant

The wrapper script uses the generated registry snapshot and trains normal plus hierarchical variants across all supported losses.

bash scripts/train_registry_embedding_spaces.sh

4. Release

Sync local release metadata

Compute checksums, summarize metrics, and generate model cards plus a publish manifest for the current checkpoint set.

python3 scripts/sync_model_releases.py

Release Process

Publishing a downloadable model

python3 scripts/publish_huggingface_models.py

Step 01

Run the release sync to compute checksums, summarize metrics, and generate model cards from local checkpoints.

Step 02

Publish the prepared bundle to the OpenToolEmbeddings Hugging Face organization once `HF_TOKEN` and `huggingface_hub` are available.

Step 03

Re-run the registry builder so the downloads page exposes the published URL and status.