| title | OSMO Workflows |
|---|---|
| description | NVIDIA OSMO workflow templates for distributed robotics training |
| author | Edge AI Team |
| ms.date | 2025-12-04 |
| ms.topic | reference |
NVIDIA OSMO workflow templates for distributed Isaac Lab training on Azure Kubernetes Service.
| Template | Purpose | Submission Script |
|---|---|---|
| train.yaml | Distributed training (base64 inline) | scripts/submit-osmo-training.sh |
| train-dataset.yaml | Distributed training (dataset upload) | scripts/submit-osmo-dataset-training.sh |
| Aspect | train.yaml | train-dataset.yaml |
|---|---|---|
| Payload | Base64-encoded archive | Dataset folder upload |
| Size limit | ~1MB | Unlimited |
| Versioning | None | Automatic |
| Reusability | Per-run | Across runs |
| Setup | None | Bucket configured |
Submits Isaac Lab distributed training through OSMO's workflow orchestration engine.
- Multi-GPU distributed training coordination
- KAI Scheduler / Volcano integration
- Automatic checkpointing and recovery
- OSMO UI monitoring dashboard
Parameters are passed as key=value pairs through the submission script:
| Parameter | Description |
|---|---|
azure_subscription_id |
Azure subscription ID |
azure_resource_group |
Resource group name |
azure_workspace_name |
ML workspace name |
task |
Isaac Lab task name |
num_envs |
Parallel environments |
max_iterations |
Training iterations |
# Default configuration from Terraform outputs
./scripts/submit-osmo-training.sh
# Override parameters
./scripts/submit-osmo-training.sh \
--azure-subscription-id "your-subscription-id" \
--azure-resource-group "rg-custom"Submits Isaac Lab training using OSMO dataset folder injection instead of base64-encoded archives.
- Dataset versioning and reusability
- No payload size limits
- Training folder mounted at
/data/<dataset_name>/training - All features from
train.yaml
| Parameter | Default | Description |
|---|---|---|
dataset_bucket |
training |
OSMO bucket for training code |
dataset_name |
training-code |
Dataset name in bucket |
training_localpath |
(required) | Local path to src/training relative to workflow |
# Default configuration
./scripts/submit-osmo-dataset-training.sh
# Custom dataset bucket
./scripts/submit-osmo-dataset-training.sh \
--dataset-bucket custom-bucket \
--dataset-name my-training-code| Variable | Description |
|---|---|
AZURE_SUBSCRIPTION_ID |
Azure subscription ID |
AZURE_RESOURCE_GROUP |
Resource group name |
WORKFLOW_TEMPLATE |
Path to workflow template |
OSMO_CONFIG_DIR |
OSMO configuration directory |
OSMO_DATASET_BUCKET |
Dataset bucket name (default: training) |
OSMO_DATASET_NAME |
Dataset name (default: training-code) |
- OSMO control plane deployed (
03-deploy-osmo-control-plane.sh) - OSMO backend operator installed (
04-deploy-osmo-backend.sh) - Storage configured for checkpoints
- OSMO CLI installed and authenticated (see Accessing OSMO)
OSMO services are deployed to the osmo-control-plane namespace. Access method depends on your network configuration.
When connected to VPN, OSMO is accessible via the internal load balancer:
| Service | URL |
|---|---|
| UI Dashboard | http://10.0.5.7 |
| API Service | http://10.0.5.7/api |
osmo login http://10.0.5.7 --method=dev --username=testuser
osmo infoNote
Verify the internal load balancer IP with: kubectl get svc -n azureml azureml-nginx-ingress -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
If should_enable_private_aks_cluster = false and not using VPN:
| Service | Port-Forward Command | Local URL |
|---|---|---|
| UI Dashboard | kubectl port-forward svc/osmo-ui 3000:80 -n osmo-control-plane |
http://localhost:3000 |
| API Service | kubectl port-forward svc/osmo-service 9000:80 -n osmo-control-plane |
http://localhost:9000 |
| Router | kubectl port-forward svc/osmo-router 8080:80 -n osmo-control-plane |
http://localhost:8080 |
# Start port-forward in background (or separate terminal)
kubectl port-forward svc/osmo-service 9000:80 -n osmo-control-plane &
# Login to OSMO (dev mode for local access)
osmo login http://localhost:9000 --method=dev --username=testuser
# Verify connection
osmo info
osmo backend listNote
When accessing OSMO through port-forwarding, osmo workflow exec and osmo workflow port-forward commands are not supported. These require the router service to be accessible via ingress.
Access the OSMO UI dashboard:
- VPN: Open
http://10.0.5.7in your browser - Port-forward: Run
kubectl port-forward svc/osmo-ui 3000:80 -n osmo-control-planethen openhttp://localhost:3000