Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/example_applications_algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Can be run from :github_nvflare_link:`hello_world notebook <examples/hello-world
* :github_nvflare_link:`Intro to the FL Simulator <examples/tutorials/flare_simulator.ipynb>` - Shows how to use the :ref:`fl_simulator` to run a local simulation of an NVFLARE deployment to test and debug an application without provisioning a real FL project.
* :github_nvflare_link:`Hello FLARE API <examples/tutorials/flare_api.ipynb>` - Goes through the different commands of the :ref:`flare_api` to show the syntax and usage of each.
* :github_nvflare_link:`NVFLARE in POC Mode <examples/tutorials/setup_poc.ipynb>` - Shows how to use :ref:`POC mode <poc_command>` to test the features of a full FLARE deployment on a single machine.
* :github_nvflare_link:`Job CLI Tutorial <examples/tutorials/job_cli.ipynb>` - Walks through the different commands of the Job CLI and showcases syntax and example usages.
* :github_nvflare_link:`NVFlare CLI Tutorial <examples/tutorials/nvflare_cli.ipynb>` - Walks through the current ``nvflare`` command groups for local setup, recipes, jobs, systems, studies, provisioning, and deployment.
* :github_nvflare_link:`Job Recipe <examples/tutorials/job_recipe.ipynb>` - Introduces Job Recipes to simplify federated learning job creation and execution with a high-level API.
* :github_nvflare_link:`FLARE Logging <examples/tutorials/logging.ipynb>` - Covers how to configure logging in FLARE for different use cases and modes.

Expand Down
60 changes: 41 additions & 19 deletions docs/examples/hello_pt_job_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,27 +56,46 @@ To run this example:

.. code-block:: shell

$ python fedavg_script_runner_pt.py
$ python job.py

The script will create an NVFlare job in /tmp/nvflare/jobs/job_config/hello-pt_cifar10_fedavg
and run it using the FL Simulator.
The script creates an NVFlare job recipe and runs it using the FL Simulator.

To export the job folder for submission to a running FL system, use the standard Recipe API export flags:

.. code-block:: shell

$ python job.py --export --export-dir /tmp/nvflare/jobs/job_config

The exported job is written to ``/tmp/nvflare/jobs/job_config/hello-pt``.
You can combine the export flags with example-specific options, for example:

.. code-block:: shell

$ python job.py --export --export-dir /tmp/nvflare/jobs/job_config \
Comment thread
chesterxgchen marked this conversation as resolved.
--enable_log_streaming --synthetic_data --train_size 2048 --test_size 256 \
--num_rounds 2 --epochs 1 --batch_size 64 --num_workers 0

NVIDIA FLARE Job API
--------------------

The ``fedavg_script_runner_pt.py`` script for this hello-pt example is very similar to the ``fedavg_script_runner_hello-numpy.py`` script
for the :doc:`Hello NumPy <hello_numpy>` exercise. Other than changes to the names of the job and client script, the only difference
is a line to define the initial global model for the server:
The ``job.py`` script for this hello-pt example defines a :class:`FedAvgRecipe<nvflare.app_opt.pt.recipes.fedavg.FedAvgRecipe>`.
The recipe combines the PyTorch model, client training script, and simulator/export behavior:

.. code-block:: python

# Define the initial global model and send to server
job.to(SimpleNetwork(), "server")
recipe = FedAvgRecipe(
name="hello-pt",
min_clients=n_clients,
num_rounds=num_rounds,
model=SimpleNetwork(),
train_script="client.py",
train_args=train_args,
)


NVIDIA FLARE Client Training Script
------------------------------------
The training script for this example, ``hello-pt_cifar10_fl.py``, is the main script that will be run on the clients. It contains the PyTorch specific
The training script for this example, ``client.py``, is the main script that will be run on the clients. It contains the PyTorch specific
logic for training.

Neural Network
Expand All @@ -90,7 +109,7 @@ Let's see the simplified CIFAR10 model used in this example:
- :github_nvflare_link:`model.py <examples/hello-world/hello-pt/model.py>`

This ``SimpleNetwork`` class is your convolutional neural network to train with the CIFAR10 dataset.
This is not related to NVIDIA FLARE, so we implement it in a file called ``simple_network.py``.
This is not related to NVIDIA FLARE, so we implement it in a file called ``model.py``.

Dataset & Setup
^^^^^^^^^^^^^^^^
Expand All @@ -101,7 +120,7 @@ the dataset we will be using on each client.
Additionally, you need to set up the optimizer, loss function and transform to process the data.
You can think of all of this code as part of your local training loop, as every deep learning training has a similar setup.

In the ``hello-pt_cifar10_fl.py`` script, we take care of all of this setup before the ``flare.init()``.
In the ``client.py`` script, we take care of all of this setup before the ``flare.init()``.

Local Train
^^^^^^^^^^^
Expand Down Expand Up @@ -137,7 +156,7 @@ Now with the network and dataset setup, let's also implement the local training
flare.send(output_model)


The code above is simplified from the ``hello-pt_cifar10_fl.py`` script to focus on the three essential methods of the NVFlare's Client API to
The code above is simplified from the ``client.py`` script to focus on the three essential methods of the NVFlare's Client API to
achieve the training workflow:

- `init()`: Initializes NVFlare Client API environment.
Expand All @@ -148,9 +167,9 @@ NVIDIA FLARE Server & Application
---------------------------------
In this example, the server runs :class:`FedAvg<nvflare.app_common.workflows.fedavg.FedAvg>` with the default settings.

If you export the job with the :func:`export<nvflare.job_config.api.FedJob.export>` function, you will see the
If you export the job with ``python job.py --export --export-dir <job_folder>``, you will see the
Comment thread
chesterxgchen marked this conversation as resolved.
configurations for the server and each client. The server configuration is ``config_fed_server.json`` in the config folder
in app_server:
in the exported app folder:

.. code-block:: json

Expand All @@ -161,6 +180,7 @@ in app_server:
"id": "controller",
"path": "nvflare.app_common.workflows.fedavg.FedAvg",
"args": {
"aggregation_weights": {},
"num_clients": 2,
"num_rounds": 2
}
Expand All @@ -185,6 +205,7 @@ in app_server:
"path": "nvflare.app_opt.tracking.tb.tb_receiver.TBAnalyticsReceiver",
"args": {
"events": [
"analytix_log_stats",
"fed.analytix_log_stats"
]
}
Expand All @@ -194,13 +215,13 @@ in app_server:
"path": "nvflare.app_opt.pt.file_model_persistor.PTFileModelPersistor",
"args": {
"model": {
"path": "src.simple_network.SimpleNetwork",
"path": "model.SimpleNetwork",
"args": {}
}
}
},
{
"id": "model_locator",
"id": "locator",
"path": "nvflare.app_opt.pt.file_model_locator.PTFileModelLocator",
"args": {
"pt_persistor_id": "persistor"
Expand All @@ -213,8 +234,8 @@ in app_server:

This is automatically created by the Job API. The server application configuration leverages NVIDIA FLARE built-in components.

Note that ``persistor`` points to ``PTFileModelPersistor``. This is automatically configured when the model SimpleNetwork is added
to the server with the :func:`to<nvflare.job_config.api.FedJob.to>` function. The Job API detects that the model is a PyTorch model
Note that ``persistor`` points to ``PTFileModelPersistor``. This is automatically configured from the
``SimpleNetwork`` model supplied to the recipe. The Job API detects that the model is a PyTorch model
and automatically configures :class:`PTFileModelPersistor<nvflare.app_opt.pt.file_model_persistor.PTFileModelPersistor>`
and :class:`PTFileModelLocator<nvflare.app_opt.pt.file_model_locator.PTFileModelLocator>`.

Expand All @@ -236,7 +257,8 @@ The client configuration is ``config_fed_client.json`` in the config folder of e
"executor": {
"path": "nvflare.app_opt.pt.in_process_client_api_executor.PTInProcessClientAPIExecutor",
"args": {
"task_script_path": "src/hello-pt_cifar10_fl.py"
"task_script_path": "client.py",
"task_script_args": "--batch_size 16 --epochs 2 --num_workers 2"
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/tutorial_notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ Tutorial notebooks on GitHub:
- :github_nvflare_link:`FL Simulator Notebook (GitHub) <examples/tutorials/flare_simulator.ipynb>`
- :github_nvflare_link:`Hello FLARE API Notebook (GitHub) <examples/tutorials/flare_api.ipynb>`
- :github_nvflare_link:`NVFLARE POC Mode in detail Notebook (GitHub) <examples/tutorials/setup_poc.ipynb>`
- :github_nvflare_link:`Job CLI Notebook (GitHub) <examples/tutorials/job_cli.ipynb>`
- :github_nvflare_link:`NVFlare CLI Notebook (GitHub) <examples/tutorials/nvflare_cli.ipynb>`
- :github_nvflare_link:`Job Recipe Notebook (GitHub) <examples/tutorials/job_recipe.ipynb>`
- :github_nvflare_link:`FLARE Logging Notebook (GitHub) <examples/tutorials/logging.ipynb>`
2 changes: 1 addition & 1 deletion docs/release_notes/flare_240.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Furthermore, the Job CLI also offers users a convenient method for submitting jo
``nvflare job list_templates|create|submit|show_variables``

Also explore the continuously growing :github_nvflare_link:`Job Template directory <job_templates>` we have created for commonly used configurations.
For more in-depth information on Job Templates and the Job CLI, refer to the :ref:`job_cli` documentation and :github_nvflare_link:`tutorials <examples/tutorials/job_cli.ipynb>`.
For more in-depth information on Job Templates and the Job CLI, refer to the :ref:`job_cli` documentation and :github_nvflare_link:`CLI tutorials <examples/tutorials/nvflare_cli.ipynb>`.

ModelLearner
~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion docs/release_notes/flare_280.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ For details, see :ref:`nvflare_cli`, :ref:`job_cli`, :ref:`system_command`,
:ref:`config_command`, and :ref:`recipe_command`.

For a hands-on CLI workflow, see the
:github_nvflare_link:`Job CLI tutorial <examples/tutorials/job_cli.ipynb>`.
:github_nvflare_link:`NVFlare CLI tutorial <examples/tutorials/nvflare_cli.ipynb>`.

Deployment and Provisioning
===========================
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Feature Tutorials
- `Simulator CLI & Python API <https://github.com/NVIDIA/NVFlare/tree/main/examples/tutorials/flare_simulator.ipynb>`_
- `FLARE Python API: Job Submission & Monitoring <https://github.com/NVIDIA/NVFlare/tree/main/examples/tutorials/flare_api.ipynb>`_
- `Logging: Configuration & Customization <https://github.com/NVIDIA/NVFlare/tree/main/examples/tutorials/logging.ipynb>`_
- `Job CLI: Job Submission & Templates <https://github.com/NVIDIA/NVFlare/tree/main/examples/tutorials/job_cli.ipynb>`_
- `NVFlare CLI: Setup, Jobs, Systems, and Deployment <https://github.com/NVIDIA/NVFlare/tree/main/examples/tutorials/nvflare_cli.ipynb>`_
- `Job Recipe: Simplified job creation <https://github.com/NVIDIA/NVFlare/tree/main/examples/tutorials/job_recipe.ipynb>`_

Self-Paced Learning
Expand Down
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown
| [Intro to the FL Simulator](./tutorials/flare_simulator.ipynb) | Shows how to use the FLARE Simulator to run a local simulation. |
| [Hello FLARE API](./tutorials/flare_api.ipynb) | Goes through the different commands of the FLARE API. |
| [NVFLARE in POC Mode](./tutorials/setup_poc.ipynb) | Shows how to use POC mode. |
| [Job CLI](./tutorials/job_cli.ipynb) | Walks through the different commands of the Job CLI. |
| [NVFlare CLI](./tutorials/nvflare_cli.ipynb) | Walks through the current `nvflare` command groups for local setup, recipes, jobs, systems, studies, provisioning, and deployment. |
| [Job Recipe](./tutorials/job_recipe.ipynb) | Introduces Job Recipes to simplify federated learning job creation and execution with a high-level API. |
| [Logging Tutorial](./tutorials/logging.ipynb) | Shows how to use the logging configuration for different modules. |

Expand Down
22 changes: 21 additions & 1 deletion examples/hello-world/hello-pt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,12 @@ You can download the CIFAR10 dataset from the Internet via torchvision’s datas
You can split the datasets for different clients, so that each client has its own dataset.
Here, for simplicity's sake, we will be using the same dataset on each client.

For quick smoke tests or offline environments, the job can use synthetic CIFAR-shaped data:

```
python job.py --synthetic_data --train_size 128 --test_size 64 --num_rounds 2 --epochs 1
```

## Model
In PyTorch, neural networks are implemented by defining a class (e.g., SimpleNetwork) that extends `nn.Module`.
The network’s architecture is set up in the __init__ method, while the forward method determines how input data flows
Expand Down Expand Up @@ -130,7 +136,7 @@ recipe = FedAvgRecipe(
num_rounds=num_rounds,
model=SimpleNetwork(),
train_script="client.py",
train_args=f"--batch_size {batch_size}",
train_args=f"--batch_size {batch_size} --epochs {epochs}",
)

env = SimEnv(num_clients=n_clients, num_threads=n_clients)
Expand Down Expand Up @@ -178,6 +184,20 @@ The cross-site evaluation results can be viewed with:
cat /tmp/nvflare/simulation/hello-pt/server/simulate_job/cross_site_val/cross_val_results.json
```

To export the job folder for submission to a running FL system, use the standard Recipe API export flags:

```
python job.py --export --export-dir /tmp/nvflare/jobs/job_config
```

The exported job is written to `/tmp/nvflare/jobs/job_config/hello-pt`. You can combine the export flags with the example-specific arguments, for example:

```
python job.py --export --export-dir /tmp/nvflare/jobs/job_config \
--enable_log_streaming --synthetic_data --train_size 2048 --test_size 256 \
--num_rounds 2 --epochs 1 --batch_size 64 --num_workers 0
```

> **Note:** Depending on the number of clients, you might run into errors if several clients try to download the data at the same time. It is suggested to pre-download the data to avoid such errors.

## Notebook
Expand Down
31 changes: 25 additions & 6 deletions examples/hello-world/hello-pt/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,21 @@ def evaluate(net, data_loader, device):
total += labels.size(0)
correct += (predicted == labels).sum().item()

print(f"Accuracy of the network on the 10000 test images: {100 * correct // total} %")
return 100 * correct // total
if total == 0:
raise ValueError("Evaluation data_loader produced no samples; check data preparation and --test_size.")
accuracy = 100 * correct // total
print(f"Accuracy of the network on {total} test images: {accuracy} %")
return accuracy
Comment thread
chesterxgchen marked this conversation as resolved.


def main():
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=2)
parser.add_argument("--batch_size", type=int, default=16)
parser.add_argument("--num_workers", type=int, default=2)
parser.add_argument("--synthetic_data", action="store_true")
parser.add_argument("--train_size", type=int, default=50000)
parser.add_argument("--test_size", type=int, default=10000)
args = parser.parse_args()
batch_size = args.batch_size
epochs = args.epochs
Expand All @@ -70,11 +77,23 @@ def main():
)

# Load datasets
train_set = torchvision.datasets.CIFAR10(root=DATASET_PATH, train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=args.batch_size, shuffle=True, num_workers=2)
if args.synthetic_data:
train_set = torchvision.datasets.FakeData(
size=args.train_size, image_size=(3, 32, 32), num_classes=10, transform=transform
)
test_set = torchvision.datasets.FakeData(
size=args.test_size, image_size=(3, 32, 32), num_classes=10, transform=transform
)
else:
train_set = torchvision.datasets.CIFAR10(root=DATASET_PATH, train=True, download=True, transform=transform)
test_set = torchvision.datasets.CIFAR10(root=DATASET_PATH, train=False, download=True, transform=transform)

test_set = torchvision.datasets.CIFAR10(root=DATASET_PATH, train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=args.batch_size, shuffle=False, num_workers=2)
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size, shuffle=True, num_workers=args.num_workers
)
test_loader = torch.utils.data.DataLoader(
test_set, batch_size=batch_size, shuffle=False, num_workers=args.num_workers
)

# (3) initializes NVFlare client API
flare.init()
Expand Down
16 changes: 14 additions & 2 deletions examples/hello-world/hello-pt/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,14 @@ def define_parser():
parser.add_argument("--n_clients", type=int, default=2)
parser.add_argument("--num_rounds", type=int, default=2)
parser.add_argument("--batch_size", type=int, default=16)
parser.add_argument("--epochs", type=int, default=2)
parser.add_argument("--num_workers", type=int, default=2)
parser.add_argument("--synthetic_data", action="store_true")
parser.add_argument("--train_size", type=int, default=50000)
parser.add_argument("--test_size", type=int, default=10000)
parser.add_argument("--train_script", type=str, default="client.py")
parser.add_argument("--cross_site_eval", action="store_true")
parser.add_argument("--enable_log_streaming", action=argparse.BooleanOptionalAction, default=False)
parser.add_argument(
"--launch_external_process",
action="store_true",
Expand All @@ -52,6 +58,10 @@ def main():
n_clients = args.n_clients
num_rounds = args.num_rounds
batch_size = args.batch_size
epochs = args.epochs
train_args = f"--batch_size {batch_size} --epochs {epochs} --num_workers {args.num_workers}"
if args.synthetic_data:
train_args += f" --synthetic_data --train_size {args.train_size} --test_size {args.test_size}"

recipe = FedAvgRecipe(
name="hello-pt",
Expand All @@ -62,7 +72,7 @@ def main():
# Alternative: model={"class_path": "model.SimpleNetwork", "args": {}},
# For pre-trained weights: initial_ckpt="/server/path/to/pretrained.pt",
train_script=args.train_script,
train_args=f"--batch_size {batch_size}",
train_args=train_args,
launch_external_process=args.launch_external_process,
client_memory_gc_rounds=args.client_memory_gc_rounds,
)
Expand All @@ -71,7 +81,9 @@ def main():
if args.cross_site_eval:
add_cross_site_evaluation(recipe)

# Run FL simulation
if args.enable_log_streaming:
recipe.enable_log_streaming()

env = SimEnv(num_clients=n_clients)
run = recipe.execute(env)
print()
Expand Down
Loading
Loading