Skip to content

feat: extend datasets commands visibility, labels, filter support#632

Open
georgi-seqera wants to merge 6 commits into
masterfrom
feat/extend-datasets-commands
Open

feat: extend datasets commands visibility, labels, filter support#632
georgi-seqera wants to merge 6 commits into
masterfrom
feat/extend-datasets-commands

Conversation

@georgi-seqera
Copy link
Copy Markdown
Contributor

@georgi-seqera georgi-seqera commented May 22, 2026

Summary

Extends the datasets command group with three new subcommands and improves datasets list with server-side filtering, pagination, label display, and hidden-dataset visibility control.


datasets hide / datasets show

Hide or un-hide one or more datasets at once, referencing them by ID, name, or a mix of both.

# Hide by ID (multiple at once)
tw datasets hide -w <workspace> -i <id1> -i <id2>

# Hide by name
tw datasets hide -w <workspace> -n my-dataset

# Un-hide by name 
tw datasets show -w <workspace> -n my-dataset

# Mixed -i / -n
tw datasets show -w <workspace> -i <id1> -n other-dataset

datasets labels

Manage labels on a single dataset — set, append, or delete.

# Replace all labels
tw datasets labels -w <workspace> -i <id> "production,validated"

# Add a label without removing existing ones
tw datasets labels -w <workspace> -i <id> --operations append "nightly"

# Remove a specific label
tw datasets labels -w <workspace> -i <id> --operations delete "nightly"

datasets list improvements

New flag Description
--show-hidden Include hidden datasets in results
--filter <expr> Server-side search (name and other supported keys)
-l Show labels column
--max / --page Pagination
tw datasets list -w <workspace> --show-hidden --filter "my-dataset" -l --max 20

The runs list --filter description has also been updated to document the full server-side search syntax (datasetId:<id>, status:<status>, after:<date>, etc.).


V2 API migration

All datasets API calls have been migrated from the legacy /workspaces/{id}/datasets/... path to the /datasets/...?workspaceId=... V2 endpoints. This covers list, add, view, view versions, update, url, download, and delete. Name-based lookups (-n) also pass visibility=all so hidden datasets can be resolved by name in all commands (e.g. show, view, delete).


Regression testing — full command examples

Prerequisites

export TW_ACCESS_TOKEN="<your-token>"
export TW_API_ENDPOINT="https://<your-platform-host>/api"  # omit for cloud
export WORKSPACE="<workspace-id-or-name>"
export PIPELINE="<pipeline-name-or-url>"
export COMPUTE_ENV="<compute-env-name>"  # optional

Create a test samplesheet:

cat > /tmp/test-samplesheet.csv <<'EOF'
sample,fastq_1,fastq_2
SAMPLE1_PE,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz
SAMPLE2_PE,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R2.fastq.gz
EOF

1. datasets list

# Baseline
tw datasets list -w "$WORKSPACE"

# Server-side filter
tw datasets list -w "$WORKSPACE" --filter "test"

# JSON output
tw -o json datasets list -w "$WORKSPACE" --filter "test"

# Pagination
tw datasets list -w "$WORKSPACE" --max 5 --page 1

# Labels column
tw datasets list -w "$WORKSPACE" -l

2. Create test datasets

DS1_ID=$(tw -o json datasets add /tmp/test-samplesheet.csv \
    -w "$WORKSPACE" -n "tw-cli-test-ds1" -d "CLI test dataset 1" --header \
    | jq -r '.datasetId')
echo "DS1_ID=$DS1_ID"

DS2_ID=$(tw -o json datasets add /tmp/test-samplesheet.csv \
    -w "$WORKSPACE" -n "tw-cli-test-ds2" -d "CLI test dataset 2" --header \
    | jq -r '.datasetId')
echo "DS2_ID=$DS2_ID"

3. View and inspect

# View by ID and by name
tw datasets view -w "$WORKSPACE" -i "$DS1_ID"
tw datasets view -w "$WORKSPACE" -n "tw-cli-test-ds1"

# List all versions
tw datasets view -w "$WORKSPACE" -i "$DS1_ID" versions

# Get URL of latest version
DS_URL=$(tw -o json datasets url -w "$WORKSPACE" -i "$DS1_ID" | jq -r '.datasetUrl')
echo "DS_URL=$DS_URL"

# Get URL of a specific version
tw datasets url -w "$WORKSPACE" -i "$DS1_ID" --dataset-version 1

# Download latest version
tw datasets download -w "$WORKSPACE" -i "$DS1_ID"

# Download a specific version
tw datasets download -w "$WORKSPACE" -i "$DS1_ID" --dataset-version 1

4. Update datasets

# Rename and update description
tw datasets update -w "$WORKSPACE" -i "$DS1_ID" \
    --new-name "tw-cli-test-ds1-renamed" -d "Updated description"

# Verify
tw datasets view -w "$WORKSPACE" -i "$DS1_ID"

# Upload a new version (rename back at the same time)
tw datasets update -w "$WORKSPACE" -i "$DS1_ID" \
    --new-name "tw-cli-test-ds1" -f /tmp/test-samplesheet.csv --header

# Verify second version was created
tw datasets view -w "$WORKSPACE" -i "$DS1_ID" versions

# Overwrite (delete + recreate) via add --overwrite
tw datasets add /tmp/test-samplesheet.csv \
    -w "$WORKSPACE" -n "tw-cli-test-ds1" -d "Overwritten" --header --overwrite

5. Hide datasets

# Hide one by ID
tw datasets hide -w "$WORKSPACE" -i "$DS1_ID"

# Hide two at once by ID
tw datasets hide -w "$WORKSPACE" -i "$DS1_ID" -i "$DS2_ID"

# Hide by name
tw datasets hide -w "$WORKSPACE" -n "tw-cli-test-ds1"

# Hide multiple by name
tw datasets hide -w "$WORKSPACE" -n "tw-cli-test-ds1" -n "tw-cli-test-ds2"

# Verify: hidden datasets not visible by default
tw datasets list -w "$WORKSPACE" --filter "tw-cli-test"

# Verify: visible with --show-hidden
tw datasets list -w "$WORKSPACE" --show-hidden --filter "tw-cli-test"

6. Show (un-hide) datasets

# Show by ID
tw datasets show -w "$WORKSPACE" -i "$DS1_ID"

# Show by name (works even when dataset is hidden)
tw datasets show -w "$WORKSPACE" -n "tw-cli-test-ds2"

# Show multiple — mix -i and -n
tw datasets show -w "$WORKSPACE" -i "$DS1_ID" -n "tw-cli-test-ds2"

# Verify both visible again
tw datasets list -w "$WORKSPACE" --filter "tw-cli-test"

7. Labels

# Set (replaces all existing labels)
tw datasets labels -w "$WORKSPACE" -i "$DS1_ID" "test,integration"

# Append
tw datasets labels -w "$WORKSPACE" -i "$DS1_ID" --operations append "nightly"

# Delete one label
tw datasets labels -w "$WORKSPACE" -i "$DS1_ID" --operations delete "nightly"

# Verify in list
tw datasets list -w "$WORKSPACE" -l --filter "tw-cli-test"

8. Pipeline runs

# Launch without a dataset
tw launch "$PIPELINE" -w "$WORKSPACE" -n "tw-cli-no-dataset-run" --stub-run

# Write a params file using the dataset URL from step 3
cat > /tmp/tw-params.json <<EOF
{ "input": "$DS_URL" }
EOF

# Launch with dataset as input
tw launch "$PIPELINE" -w "$WORKSPACE" \
    -n "tw-cli-with-dataset-run" --params-file /tmp/tw-params.json --stub-run

# Launch and wait for completion
tw launch "$PIPELINE" -w "$WORKSPACE" \
    -n "tw-cli-with-dataset-run-wait" --params-file /tmp/tw-params.json \
    --stub-run --wait SUCCEEDED

9. runs list — filter by dataset

tw runs list -w "$WORKSPACE" --max 10
tw runs list -w "$WORKSPACE" --filter "tw-cli"
tw runs list -w "$WORKSPACE" --filter "datasetId:$DS1_ID"
tw runs list -w "$WORKSPACE" --filter "status:SUCCEEDED"
tw runs list -w "$WORKSPACE" --filter "datasetId:$DS1_ID after:2026-01-01"

10. Cleanup

tw datasets delete -w "$WORKSPACE" -i "$DS1_ID"
tw datasets delete -w "$WORKSPACE" -i "$DS2_ID"

# List and cancel any remaining test runs
tw runs list -w "$WORKSPACE" --filter "runName:tw-cli"
# tw runs cancel -w "$WORKSPACE" -i <run-id>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant