Skip to content

Commit

Permalink
YARN-7502. Nodemanager restart docs should describe nodemanager super…
Browse files Browse the repository at this point in the history
…vised property. Contributed by Suma Shivaprasad.

(cherry picked from commit e503345)
  • Loading branch information
sunilgovind committed Oct 22, 2018
1 parent 622919d commit 4a481e1
Showing 1 changed file with 8 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,19 @@ Step 2. Configure a path to the local file-system directory where the NodeManag
|:---- |:---- |
| `yarn.nodemanager.recovery.dir` | The local filesystem directory in which the node manager will store state when recovery is enabled. The default value is set to `$hadoop.tmp.dir/yarn-nm-recovery`. |

Step 3. Configure a valid RPC address for the NodeManager.
Step 3: Enable NM supervision under recovery to prevent running containers from getting cleaned up when NM exits.

| Property | Description |
|:---- |:---- |
| `yarn.nodemanager.recovery.supervised` | If enabled, NodeManager running will not try to cleanup containers as it exits with the assumption it will be immediately be restarted and recover containers The default value is set to 'false'. |

Step 4. Configure a valid RPC address for the NodeManager.

| Property | Description |
|:---- |:---- |
| `yarn.nodemanager.address` | Ephemeral ports (port 0, which is default) cannot be used for the NodeManager's RPC server specified via yarn.nodemanager.address as it can make NM use different ports before and after a restart. This will break any previously running clients that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling NM restart. |

Step 4. Auxiliary services.
Step 5. Auxiliary services.

* NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitializes the auxiliary service.

Expand Down

0 comments on commit 4a481e1

Please sign in to comment.