Skip to content

Commit

Permalink
Rearranging documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
definitelynotmcarilli committed Mar 7, 2019
1 parent 0c2a629 commit 4606df9
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 35 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ users as quickly as possible.

## 1. Mixed Precision

### amp: Automatic Mixed Precision
### Amp: Automatic Mixed Precision

`apex.amp` is a tool to enable mixed precision training by changing only 3 lines of your script.
Users can easily experiment with different pure and mixed precision training modes by supplying
Expand All @@ -27,7 +27,7 @@ different flags to `amp.initialize`.

[DCGAN example coming soon...](https://github.com/NVIDIA/apex/tree/master/examples/dcgan)

[Moving to the new Amp API] (for users of the deprecated tools formerly called "Amp" and "FP16_Optimizer")
[Moving to the new Amp API](https://nvidia.github.io/apex/amp.html#transition-guide-for-old-api-users) (for users of the deprecated tools formerly called "Amp" and "FP16_Optimizer")

## 2. Distributed Training

Expand Down
68 changes: 35 additions & 33 deletions docs/source/amp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,22 @@ apex.amp
This page documents the updated API for Amp (Automatic Mixed Precision),
a tool to enable Tensor Core-accelerated training in only 3 lines of Python.

Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
or properties are chosen. Amp intends that users start with an existing default (FP32) script,
add the three lines corresponding to the Amp API, and begin training with mixed precision.
Amp can also be disabled, in which case the original script will behave exactly as it used to.
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.
A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
on the Github page.

GANs are a tricky case that many people have requested. A `comprehensive DCGAN example`_
is under construction.

``opt_level``\ s and Properties
-------------------------------

Amp allows users to easily experiment with different pure and mixed precision modes.
Commonly-used default modes are chosen by
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
properties that govern Amp's implementation of pure or mixed precision training.
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
particular properties directly to ``amp.initialize``. These manually specified values
override the defaults established by the ``opt_level``.

Example::

Expand All @@ -27,39 +38,33 @@ Example::
scaled_loss.backward()
...

A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
on the Github page.
Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
or properties are chosen. Amp intends that users start with an existing default (FP32) script,
add the three lines corresponding to the Amp API, and begin training with mixed precision.
Amp can also be disabled, in which case the original script will behave exactly as it used to.
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.

GANs are a tricky case that many people have requested. A `comprehensive DCGAN example`_
is under construction.
.. note::
Because it's never necessary to manually cast your model (aside from the call ``amp.initialize``)
or input data, a script that adheres to the new API
can switch between different ``opt-level``\ s without having to make any other changes.

.. _`runnable, comprehensive Imagenet example`:
https://github.com/NVIDIA/apex/tree/master/examples/imagenet

.. _`comprehensive DCGAN example`:
https://github.com/NVIDIA/apex/tree/master/examples/dcgan

``opt_level``\ s and Properties
-------------------------------

Amp allows users to easily experiment with different pure and mixed precision modes, including
pure FP16 training and pure FP32 training. Commonly-used default modes are chosen by
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
properties that govern Amp's implementation of pure or mixed precision training.
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
particular properties directly to ``amp.initialize``. These manually specified values will
override the defaults established by the ``opt_level``.

Properties
**********

Currently, the under-the-hood properties that govern pure or mixed precision training are the following:

- ``cast_model_type``: Casts your model's parameters and buffers to the desired type.
- ``patch_torch_functions``: Patch all Torch functions and Tensor methods to perform Tensor Core-friendly ops like GEMMs and convolutions in FP16, and any ops that benefit from FP32 precision in FP32.
- ``keep_batchnorm_fp32``: To enhance precision and enable cudnn batchnorm (which improves performance), it's often beneficial to keep batchnorms in particular in FP32 even if the rest of the model is FP16.
- ``keep_batchnorm_fp32``: To enhance precision and enable cudnn batchnorm (which improves performance), it's often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16.
- ``master_weights``: Maintain FP32 master weights to accompany any FP16 model weights. FP32 master weights are stepped by the optimizer to enhance precision and capture small gradients.
- ``loss_scale``: If ``loss_scale`` is a float value, use this value as the static (fixed) loss scale. If ``loss_scale`` is the string ``"dynamic"``, adapatively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.
- ``loss_scale``: If ``loss_scale`` is a float value, use this value as the static (fixed) loss scale. If ``loss_scale`` is the string ``"dynamic"``, adaptively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.

Again, you often don't need to specify these properties by hand. Instead, select an ``opt_level``,
which will set them up for you. After selecting an ``opt_level``, you can optionally pass property
Expand All @@ -85,7 +90,7 @@ Your incoming model should be FP32 already, so this is likely a no-op.
| Default properties set by ``O0``:
| ``cast_model_type=torch.float32``
| ``patch_torch_functions=False``
| ``keep_batchnorm_fp32=None`` (effectively, "not applicable")
| ``keep_batchnorm_fp32=None`` (effectively, "not applicable," everything is FP32)
| ``master_weights=False``
| ``loss_scale=1.0``
|
Expand Down Expand Up @@ -116,21 +121,23 @@ what gives the best speedup and accuracy for your model.

Patch all Torch functions and Tensor methods to cast their inputs according to a whitelist-blacklist
model. Whitelist ops (for example, Tensor Core-friendly ops like GEMMs and convolutions) are performed
in FP16. Blacklist ops that benefit from FP32 precision (for example, batchnorm and softmax)
in FP16. Blacklist ops that benefit from FP32 precision (for example, softmax)
are performed in FP32. ``O1`` also uses dynamic loss scaling, unless overridden.

| Default properties set by ``O1``:
| ``cast_model_type=None`` (not applicable)
| ``patch_torch_functions=True``
| ``keep_batchnorm_fp32=None`` (not necessary to specify True, batchnorm inputs are cast to FP32)
| ``keep_batchnorm_fp32=None`` (again, "not applicable," all model weights remain FP32)
| ``master_weights=None`` (not applicable, model weights remain FP32)
| ``loss_scale="dynamic"``
|
|
``O2``: Fast Mixed Precision
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``O2`` casts the model to FP16, keeps batchnorms in FP32, maintains master weights in FP32,
``O2`` casts the model weights to FP16,
patches the model's ``forward`` method to cast input
data to FP16, keeps batchnorms in FP32, maintains FP32 master weights,
and implements dynamic loss scaling (unless overridden).
Unlike ``O1``, ``O2`` does not patch Torch functions or Tensor methods.

Expand Down Expand Up @@ -221,14 +228,9 @@ One annoying aspect of FP16_Optimizer was that the user had to manually convert
(either by calling ``.half()`` on it, or using a function or module wrapper from
``apex.fp16_utils``), and also manually call ``.half()`` on input data. **Neither of these are
necessary in the new API. No matter what --opt-level
you choose, you can and should simply build your model in the default FP32 format.** The new Amp
API will perform the right conversions during
you choose, you can and should simply build your model and pass input data in the default FP32 format.**
The new Amp API will perform the right conversions during
``model, optimizer = amp.initialize(model, optimizer, opt_level=....)`` based on the ``--opt-level``
and any overridden flags. Floating point input data may be FP32 or FP16, but you may as well just
let it be FP16, because the ``model`` returned by ``amp.initialize`` will have its ``forward``
method patched to cast the input data appropriately.

.. note::
Aside from the call to ``amp.initialize`` itself, it's never necessary to manually cast
your model or data with the new API. Therefore, a script that adheres to the new API
can switch between different ``opt-level``\ s without having to make any other changes.

0 comments on commit 4606df9

Please sign in to comment.