Skip to content

[2.8] Add NVFlare CLI tutorial#4639

Merged
chesterxgchen merged 3 commits into
NVIDIA:2.8from
chesterxgchen:tutorial
May 19, 2026
Merged

[2.8] Add NVFlare CLI tutorial#4639
chesterxgchen merged 3 commits into
NVIDIA:2.8from
chesterxgchen:tutorial

Conversation

@chesterxgchen
Copy link
Copy Markdown
Collaborator

Summary

  • Replace the old job CLI notebook with an end-to-end NVFlare CLI tutorial covering config, system, study, and job workflows.
  • Update hello-pt so the tutorial can export a runnable synthetic-data job with log streaming enabled.
  • Normalize job stats behavior when a job is no longer running and add focused coverage.

Validation

  • pytest tests/unit_test/tool/job/job_stats_test.py
  • jq empty examples/tutorials/nvflare_cli.ipynb
  • verified source notebook has no outputs or execution counts

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR replaces the old job-template-focused job_cli.ipynb with a new end-to-end nvflare_cli.ipynb tutorial covering config, system, study, recipe discovery, job export/submit/monitor/abort, and result download. It also adds synthetic-data and log-streaming support to hello-pt, normalises JobNotRunning handling in cmd_job_stats, and adds focused unit tests for both changes.

  • nvflare_cli.ipynb: Complete POC workflow demonstrated in order; JSON extraction throughout uses defensive .get() with explicit RuntimeError raises; the stats cell correctly handles both the new JOB_NOT_RUNNING error code and the older INTERNAL_ERROR + "is not running" message for backward compatibility with older servers.
  • hello-pt (client.py, job.py): Adds --synthetic_data, --train_size, --test_size, --num_workers, and --enable_log_streaming flags; the --export/--export-dir flags used by the notebook are consumed at import time by nvflare.recipe.spec._consume_recipe_args() and never reach argparse in job.py.
  • job_cli.py: cmd_job_stats now catches JobNotRunning and emits a structured JOB_NOT_RUNNING error with a human-readable detail message, matching the pattern already used in cmd_job_abort.

Confidence Score: 5/5

Safe to merge; changes are additive, well-tested, and consistent with existing codebase patterns.

The logic changes are straightforward: a new structured error handler in cmd_job_stats, a ValueError guard replacing a silent division-by-zero in evaluate(), and synthetic-data flags in hello-pt. All three are covered by the accompanying unit tests. The notebook is tutorial content with no runtime logic that could affect production code paths.

No files require special attention.

Important Files Changed

Filename Overview
examples/tutorials/nvflare_cli.ipynb New end-to-end CLI tutorial notebook covering config, system, study, and job workflows. JSON extraction in the abort cell now uses the same defensive .get() pattern as the submit cell.
nvflare/tool/job/job_cli.py Adds JobNotRunning handler to cmd_job_stats and enriches the abort handler's detail message. The return after output_error() is dead code (output_error calls sys.exit), consistent with the existing pattern in the file.
examples/hello-world/hello-pt/client.py Adds --synthetic_data/--train_size/--test_size/--num_workers flags and raises ValueError when data_loader is empty, replacing the silent division-by-zero.
examples/hello-world/hello-pt/job.py Mirrors new client.py flags and wires --enable_log_streaming to recipe.enable_log_streaming(). The --export/--export-dir args used in the notebook are stripped from sys.argv by nvflare.recipe.spec at import time and never seen by argparse here.
tests/unit_test/examples/hello_pt_client_test.py New test that loads client.py via importlib and verifies the ValueError guard. Only catches RuntimeError during exec_module; an ImportError in nvflare.client would propagate unhandled, but that reflects a real environment problem rather than a test logic bug.
tests/unit_test/tool/job/job_stats_test.py Adds test_stats_job_not_running_exits_1 verifying the new JobNotRunning handler; asserts error_code, exit_code, job_id presence, and detail text.

Sequence Diagram

sequenceDiagram
    participant NB as Notebook Cell
    participant CLI as nvflare CLI
    participant srv as FL Server
    participant job as job.py

    NB->>job: python job.py --export --export-dir ... --enable_log_streaming --synthetic_data
    Note over job: spec.py strips --export/--export-dir at import time
    job->>job: define_parser() sees remaining args only
    job->>job: recipe.execute(env) export mode
    job-->>NB: SystemExit(0), job folder written

    NB->>CLI: nvflare job submit -j JOB_FOLDER --submit-token TOKEN
    CLI-->>NB: JSON status ok, job_id returned

    NB->>CLI: nvflare job stats JOB_ID --site all
    alt job still running
        CLI-->>NB: JSON status ok, stats data
    else job already completed
        CLI-->>NB: JSON error JOB_NOT_RUNNING
        NB->>NB: non-fatal timing message, continue
    end

    NB->>CLI: nvflare job wait JOB_ID --timeout 600
    CLI->>srv: poll until terminal
    srv-->>CLI: FINISHED_OK
    CLI-->>NB: JSON status ok

    NB->>CLI: nvflare job download JOB_ID --output-dir RESULT_DIR
    CLI-->>NB: artifacts written

    NB->>job: python job.py --export ... --num_rounds 20
    job-->>NB: SystemExit(0), folder overwritten
    NB->>CLI: nvflare job submit abort token
    CLI-->>NB: ABORT_JOB_ID returned
    NB->>CLI: nvflare job abort ABORT_JOB_ID --force
    CLI->>srv: abort running job
    NB->>CLI: nvflare job wait ABORT_JOB_ID
    CLI-->>NB: terminal ABORTED
Loading

Reviews (3): Last reviewed commit: "[2.8] Use recipe export args in hello-pt..." | Re-trigger Greptile

Comment thread examples/hello-world/hello-pt/client.py
Comment thread examples/tutorials/nvflare_cli.ipynb
@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

Addressed the latest review comments in commit d023456:

  • Guarded abort-job submit parsing in nvflare_cli.ipynb so failed submits raise the JSON response instead of KeyError.
  • Added a comment explaining the stats-cell INTERNAL_ERROR fallback for older server responses.
  • Added JOB_NOT_RUNNING details for job stats and job abort.
  • Added an explicit ValueError for empty hello-pt evaluation dataloaders, plus regression coverage.

Verification:

  • python3 -m pytest tests/unit_test/tool/job/job_stats_test.py tests/unit_test/tool/job/job_abort_test.py tests/unit_test/examples/hello_pt_client_test.py -q: 18 passed
  • git diff --check
  • python3 -m black --check nvflare/tool/job/job_cli.py tests/unit_test/tool/job/job_stats_test.py tests/unit_test/examples/hello_pt_client_test.py examples/hello-world/hello-pt/client.py
  • python3 -m isort --check-only nvflare/tool/job/job_cli.py tests/unit_test/tool/job/job_stats_test.py tests/unit_test/examples/hello_pt_client_test.py examples/hello-world/hello-pt/client.py
  • python3 -m flake8 nvflare/tool/job/job_cli.py tests/unit_test/tool/job/job_stats_test.py tests/unit_test/examples/hello_pt_client_test.py examples/hello-world/hello-pt/client.py
  • ./runtest.sh -s

@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

/build

@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

Follow-up pushed in commit 5a66aee:

  • Removed the hello-pt-specific --export_config flag and switched the tutorial to the standard Recipe API --export --export-dir flow.
  • Updated the hello-pt README with the standard export command and tutorial-style export example.
  • Updated docs/examples/hello_pt_job_api.rst to use the current job.py/client.py/model.py flow and exported config paths.

Verification:

  • python3 -m pytest tests/unit_test/tool/job/job_stats_test.py tests/unit_test/tool/job/job_abort_test.py tests/unit_test/examples/hello_pt_client_test.py -q: 18 passed
  • python3 job.py --export --export-dir /tmp/nvflare_cli_export_check_69132 --enable_log_streaming --synthetic_data --train_size 4 --test_size 4 --num_rounds 1 --epochs 1 --batch_size 2 --num_workers 0
  • git diff --check
  • targeted black/isort/flake8
  • ./runtest.sh -s

@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

/build

Comment thread docs/examples/hello_pt_job_api.rst
Comment thread docs/examples/hello_pt_job_api.rst
Copy link
Copy Markdown
Collaborator

@nvidianz nvidianz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args are already inconsistent, not caused by this PR.

@chesterxgchen chesterxgchen merged commit 3a5e70f into NVIDIA:2.8 May 19, 2026
26 checks passed
@chesterxgchen chesterxgchen deleted the tutorial branch May 19, 2026 16:30
chesterxgchen added a commit to chesterxgchen/NVFlare that referenced this pull request May 19, 2026
## Summary
- Replace the old job CLI notebook with an end-to-end NVFlare CLI
tutorial covering config, system, study, and job workflows.
- Update hello-pt so the tutorial can export a runnable synthetic-data
job with log streaming enabled.
- Normalize job stats behavior when a job is no longer running and add
focused coverage.

## Validation
- pytest tests/unit_test/tool/job/job_stats_test.py
- jq empty examples/tutorials/nvflare_cli.ipynb
- verified source notebook has no outputs or execution counts
chesterxgchen added a commit that referenced this pull request May 19, 2026
## What this does

Cherry-picks the two recent 2.8 PRs onto main:

- #4630: 2.8.0 release notes
- #4639: NVFlare CLI tutorial replacement and related hello-pt/job stats
updates

## Verification

- python3 -m pytest tests/unit_test/tool/job/job_stats_test.py
tests/unit_test/tool/job/job_abort_test.py
tests/unit_test/examples/hello_pt_client_test.py -q: 18 passed
- notebook no outputs/execution counts check
- git diff --check upstream/main..HEAD
- targeted black/isort/flake8
- ./runtest.sh -s

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants