Updated 'Up and Running with Composer' (#619)

growlix · web-flow · commit d641fb351705 · 2022-02-28T14:49:26.000-08:00
diff --git a/notebooks/custom_method_tutorial.ipynb b/notebooks/custom_method_tutorial.ipynb
@@ -33,7 +33,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Installing from a branch until main is updated\n",
     "!pip install mosaicml \n",
     "!pip install matplotlib"
    ]
diff --git a/notebooks/up_and_running_with_composer.ipynb b/notebooks/up_and_running_with_composer.ipynb
@@ -29,7 +29,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install git+https://github.com/mosaicml/composer.git@dev"
+    "!pip install mosaicml "
    ]
   },
   {
@@ -43,6 +43,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Imports\n",
+    "\n",
     "In this section we'll set up our workspace. We'll import the necessary packages, and setup our dataset and trainer. First, the imports:"
    ]
   },
@@ -66,6 +68,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Dataset & Dataloader\n",
+    "\n",
     "Next, we instantiate our CIFAR10 dataset and dataloader. Composer has it's own CIFAR10 dataset and dataloaders, but this walkthrough focuses on how to use Composer's algorithms, so we'll stick with the Torchvision CIFAR10 and PyTorch dataloader for the sake of familiarity."
    ]
   },
@@ -96,6 +100,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Model\n",
+    "\n",
     "Next, we create our model. We're using composer's built-in ResNet56. To use your own custom model, please see the [custom models tutorial](https://docs.mosaicml.com/en/v0.3.1/tutorials/adding_models_datasets.html#models)."
    ]
   },
@@ -113,6 +119,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Optimizer and Scheduler\n",
+    "\n",
     "The trainer will handle instantiating the optimizer, but first we need to create the optimizer and LR scheduler. We're using [MosaicML's SGD with decoupled weight decay](https://arxiv.org/abs/1711.05101):"
    ]
   },
@@ -134,52 +142,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We'll assume this is being run on Colab, which means training for hundreds of epochs would take a very long time. Instead we'll train our baseline model for three epochs. The first epoch will be linear warmup. We achieve this by instantiating a `WarmUpLRHparams` class."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## Having to manually compute the number of iterations is ungainly\n",
-    "\n",
-    "warmup = composer.optim.WarmUpLR(\n",
-    "    optimizer, # Optimizer\n",
-    "    warmup_iters=25, # Number of iterations to warmup over. 50k samples * 1 batch/2048 samples\n",
-    "    warmup_method=\"linear\", # Linear warmup\n",
-    "    warmup_factor=1e-4, # Initial LR = LR * warmup_factor\n",
-    "    interval=\"step\", # Update LR with stepwise granularity for superior results\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll also use cosine decay for the next two epochs:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## Seems like there's no way to have stepwise resolution for cosine decay without using CosineAnnealingLRHparams :(\n",
-    "\n",
-    "# decay = torch.optim.lr_scheduler.CosineAnnealingLR(\n",
-    "#     optimizer,\n",
-    "#     T_max=50 # 2 epochs == 50 steps\n",
-    "# )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "An LR schedule is provided to the trainer as a list of SchedulerHparams objects. In this case we just have one, but creating a more complex LR schedule is as simple as collecting the SchedulerHparams objects in a list."
+    "We'll assume this is being run on Colab, which means training for hundreds of epochs would take a very long time. Instead we'll train our baseline model for three epochs. The first epoch will be linear warmup, followed by two epochs of constant LR. We achieve this by instantiating a `LinearWithWarmupLRHparams` class and calling its `initialize_object()` method."
    ]
   },
   {
@@ -188,7 +151,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "lr_schedule = [warmup] # Complex LR schedules achieved by [warmup_scheduler_object, decay_scheduler_object, ...]"
+    "lr_scheduler_hparams = composer.optim.LinearWithWarmupLRHparams(\n",
+    "    warmup_time=\"1ep\", # Warm up over 1 epoch\n",
+    "    start_factor=1.0, # Flat LR schedule achieved by having start_factor == end_factor\n",
+    "    end_factor=1.0\n",
+    ")\n",
+    "lr_scheduler = lr_scheduler_hparams.initialize_object() # This gets us the actual LR scheduler"
    ]
   },
   {
@@ -211,8 +179,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "## Need in-memory logging to store and plot results\n",
-    "\n",
     "train_epochs = \"3ep\" # Train for 3 epochs because we're assuming Colab environment and hardware\n",
     "device = \"gpu\" # Train on the GPU\n",
     "\n",
@@ -222,13 +188,9 @@
     "    eval_dataloader=test_dataloader,\n",
     "    max_duration=train_epochs,\n",
     "    optimizers=optimizer,\n",
-    "    schedulers=lr_schedule,\n",
+    "    schedulers=lr_scheduler,\n",
     "    device=device\n",
-    ")\n",
-    "\n",
-    "\n",
-    "\n",
-    "# Would be nice to be able to plot something here"
+    ")"
    ]
   },
   {
@@ -241,11 +203,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "tags": [
-     "trainer_fit"
-    ]
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "start_time = time.perf_counter()\n",
@@ -321,7 +279,7 @@
    "source": [
     "prog_resize = composer.algorithms.ProgressiveResizing(\n",
     "    initial_scale=.6, # Size of images at the beginning of training = .6 * default image size\n",
-    "    finetune_fraction=0.33 # Train on default size images for 0.5 of total training time.\n",
+    "    finetune_fraction=0.34 # Train on default size images for 0.34 of total training time.\n",
     ")"
    ]
   },
@@ -345,7 +303,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now let's instantiate our model, optimizer, scheduler, and trainer again."
+    "Now let's instantiate our model, optimizer, and trainer again. No need to instantiate our scheduler again because it's stateless!"
    ]
   },
   {
@@ -363,23 +321,13 @@
     "    weight_decay=2.0e-3\n",
     "    )\n",
     "\n",
-    "warmup = composer.optim.WarmUpLR(\n",
-    "    optimizer, # Optimizer\n",
-    "    warmup_iters=25, # Number of iterations to warmup over. 50k samples * 1 batch/2048 samples\n",
-    "    warmup_method=\"linear\", # Linear warmup\n",
-    "    warmup_factor=1e-4, # Initial LR = LR * warmup_factor\n",
-    "    interval=\"step\", # Update LR with stepwise granularity for superior results\n",
-    ")\n",
-    "\n",
-    "lr_schedule = [warmup]\n",
-    "\n",
     "trainer = composer.trainer.Trainer(\n",
     "    model=model,\n",
     "    train_dataloader=train_dataloader,\n",
     "    eval_dataloader=test_dataloader,\n",
     "    max_duration=train_epochs,\n",
     "    optimizers=optimizer,\n",
-    "    schedulers=lr_schedule,\n",
+    "    schedulers=lr_scheduler,\n",
     "    device=device,\n",
     "    algorithms=algorithms # Adding algorithms this time\n",
     ")"
@@ -395,17 +343,14 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "tags": [
-     "trainer_fit"
-    ]
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "start_time = time.perf_counter()\n",
     "trainer.fit()\n",
     "end_time = time.perf_counter()\n",
-    "print(f\"It took {end_time - start_time:0.4f} seconds to train\")"
+    "three_epochs_accelerated_time = end_time - start_time\n",
+    "print(f\"It took {three_epochs_accelerated_time:0.4f} seconds to train\")"
    ]
   },
   {
@@ -414,7 +359,7 @@
    "source": [
     "Again, the runtime will vary based on the instance, but we found that it took about a 0.43x-0.75x as long to train (a 1.3x-2.3x speedup, which corresponds to 90-400 seconds) relative to the baseline recipe without augmentations. We also found that validation accuracy was similar for the algorithm-enhanced and baseline recipes.\n",
     "\n",
-    "Because ColOut and Progressive Resizing increase data throughput (i.e. more samples per sceond), we can train for more iterations in the same amount of wall clock time. Let's train for another epoch!"
+    "Because ColOut and Progressive Resizing increase data throughput (i.e. more samples per sceond), we can train for more iterations in the same amount of wall clock time. Let's train our model for one additional epoch!"
    ]
   },
   {
@@ -430,49 +375,60 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We'll need to set up our optimizer, scheduler, and trainer again:"
+    "Resuming training means we'll need to use a flat LR schedule:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "tags": [
-     "trainer_fit"
-    ]
-   },
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lr_scheduler_hparams = composer.optim.ConstantLRHparams()\n",
+    "lr_scheduler = lr_scheduler_hparams.initialize_object() # This gets us the actual LR scheduler"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And we can also get rid of progressive resizing (because we want to train on the full size images for this additional epoch), and the model already has blurpool enabled, so we don't need to pass that either:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "algorithms = [colout]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
    "outputs": [],
    "source": [
-    "optimizer = composer.optim.DecoupledSGDW(\n",
-    "    model.parameters(),\n",
-    "    lr=0.05,\n",
-    "    momentum=0.9,\n",
-    "    weight_decay=2.0e-3\n",
-    "    )\n",
-    "\n",
-    "# The Composer trainer defaults to cosine decay if no schedule is passed to it,\n",
-    "# so let's creata constant LR schedule\n",
-    "flat_schedule = composer.optim.ConstantLR(\n",
-    "    optimizer\n",
-    ")\n",
-    "\n",
-    "lr_schedule = [flat_schedule]\n",
-    "\n",
     "trainer = composer.trainer.Trainer(\n",
     "    model=model,\n",
     "    train_dataloader=train_dataloader,\n",
     "    eval_dataloader=test_dataloader,\n",
-    "    max_duration='5ba',\n",
+    "    max_duration=train_epochs,\n",
     "    optimizers=optimizer,\n",
-    "    schedulers=lr_schedule,\n",
+    "    schedulers=lr_scheduler,\n",
     "    device=device,\n",
-    "    algorithms=[colout, blurpool] # Adding algorithms this time\n",
+    "    algorithms=algorithms\n",
     ")\n",
     "\n",
     "start_time = time.perf_counter()\n",
     "trainer.fit()\n",
+    "\n",
     "end_time = time.perf_counter()\n",
-    "print(f\"It took {end_time - start_time:0.4f} seconds to train\")"
+    "final_epoch_accelerated_time = end_time - start_time\n",
+    "# Time for four epochs = time for three epochs + time for fourth epoch\n",
+    "four_epochs_accelerated_time = three_epochs_accelerated_time + final_epoch_accelerated_time\n",
+    "print(f\"It took {four_epochs_accelerated_time:0.4f} seconds to train\")"
    ]
   },
   {
@@ -508,7 +464,7 @@
   "accelerator": "GPU",
   "colab": {
    "collapsed_sections": [],
-   "name": "Fast Training of ResNet56 on CIFAR10.ipynb",
+   "name": "up_and_running_with_composer.ipynb",
    "provenance": [],
    "toc_visible": true
   },

Original file line number	Diff line number	Diff line change
`@@ -33,7 +33,6 @@`
`33`	`33`	`"metadata": {},`
`34`	`34`	`"outputs": [],`
`35`	`35`	`"source": [`
`36`		`- "# Installing from a branch until main is updated\n",`
`37`	`36`	`"!pip install mosaicml \n",`
`38`	`37`	`"!pip install matplotlib"`
`39`	`38`	`]`