Initial commit of distributed RL tutorial

radhakrishnang · Feb 21, 2018 · c62aeec · c62aeec
1 parent c7ef6d9
commit c62aeec
Show file tree

Hide file tree

Showing 41 changed files with 4,734 additions and 0 deletions.
diff --git a/AirSimDistributedRL/CreateImage.ps1 b/AirSimDistributedRL/CreateImage.ps1
@@ -0,0 +1,24 @@
+Param(
+        [Parameter(Mandatory=$true)]
+        [String] $subscriptionId,
+        [Parameter(Mandatory=$true)]
+        [String] $storageAccountName,
+        [Parameter(Mandatory=$true)]
+        [String] $storageAccountKey, 
+        [Parameter(Mandatory=$true)]
+        [String] $resourceGroupName
+)
+
+Login-AzureRMAccount
+Select-AzureRmSubscription -SubscriptionId $subscriptionId
+
+$cmd = 'azcopy /Source:https://airsimimage.blob.core.windows.net/airsimimage/AirsimImage.vhd /Dest:https://{0}.blob.core.windows.net/prereq/AirsimImage.vhd /destKey:{1}' -f $storageAccountName, $storageAccountKey
+
+write-host $cmd
+iex $cmd
+
+$newBlobPath = 'https://{0}.blob.core.windows.net/prereq/AirsimImage.vhd' -f $storageAccountName
+
+$imageConfig = New-AzureRmImageConfig -Location 'EastUs'
+$imageConfig = Set-AzureRmImageOsDisk -Image $imageConfig -OsType Windows -OsState Generalized -BlobUri $newBlobPath
+$image = New-AzureRmImage -ImageName 'AirsimImage' -ResourceGroupName $resourceGroupName -Image $imageConfig
diff --git a/AirSimDistributedRL/ExploreAlgorithm.ipynb b/AirSimDistributedRL/ExploreAlgorithm.ipynb
diff --git a/AirSimDistributedRL/LaunchTrainingJob.ipynb b/AirSimDistributedRL/LaunchTrainingJob.ipynb
@@ -0,0 +1,196 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Step 2 - Launch the Training Job\n",
+    "\n",
+    "In this notebook, we will use the cluster created in **Step 0 - Set up the Cluster** to train the reinforcement learning model. First, start off by importing some libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import uuid\n",
+    "import json\n",
+    "\n",
+    "#Azure batch. To install, run 'pip install cryptography azure-batch azure-storage'\n",
+    "import azure.batch.batch_service_client as batch\n",
+    "import azure.batch.batch_auth as batchauth\n",
+    "import azure.batch.models as batchmodels\n",
+    "\n",
+    "with open('notebook_config.json', 'r') as f:\n",
+    "    NOTEBOOK_CONFIG = json.loads(f.read()) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, we will define some hyperparameters for the training job. The parameters are:\n",
+    "\n",
+    "* **batch_update_frequency**: This is how often the weights from the actor model get copied to the critic model. It is also how often the model gets saved to disk.\n",
+    "* **max_epoch_runtime_sec**: This is the maximum runtime for each epoch. If the car has not reached a terminal state after this many seconds, the epoch will be terminated and training will begin.\n",
+    "* **per_iter_epsilon_reduction**: The agent uses an epsilon greedy linear annealing strategy while training. This is the amount by which epsilon is reduced each iteration.\n",
+    "* **min_epsilon**: The minimum value for epsilon. Once reached, the epsilon value will not decrease any further.\n",
+    "* **batch_size**: The minibatch size to use for training.\n",
+    "* **replay_memory_size**: The number of examples to keep in the replay memory. The replay memory is a FIFO buffer used to reduce the effects of nearby states being correlated. Minibatches are generated from randomly selecting examples from the replay memory.\n",
+    "* **weights_path**: If we are using pretrained weights for the model, they will be loaded from this path.\n",
+    "* **train_conv_layers**: If we are using pretrained weights, we may prefer to freeze the convolutional layers to speed up training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "batch_update_frequency = 300\n",
+    "max_epoch_runtime_sec = 30\n",
+    "per_iter_epsilon_reduction=0.003\n",
+    "min_epsilon = 0.1\n",
+    "batch_size = 32\n",
+    "replay_memory_size = 2000\n",
+    "weights_path = 'Z:\\\\data\\\\pretrain_model_weights.h5'\n",
+    "train_conv_layers = 'false'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Connect to the Azure Batch service and create a unique job name"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "batch_credentials = batchauth.SharedKeyCredentials(NOTEBOOK_CONFIG['batch_account_name'], NOTEBOOK_CONFIG['batch_account_key'])\n",
+    "batch_client = batch.BatchServiceClient(batch_credentials, base_url=NOTEBOOK_CONFIG['batch_account_url'])\n",
+    "\n",
+    "job_id = 'distributed_rl_{0}'.format(str(uuid.uuid4()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we create the job. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "job = batch.models.JobAddParameter(\n",
+    "        job_id,\n",
+    "        batch.models.PoolInformation(pool_id=NOTEBOOK_CONFIG['batch_pool_name']))\n",
+    "\n",
+    "batch_client.job.add(job)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Although we've created the job, we haven't actually told the machines what to do. For that, we need to create tasks in the job. Each machine will pick up a different task. We create one task for the trainer node, and one task for each of the agent nodes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "tasks = []\n",
+    "\n",
+    "# Trainer task\n",
+    "tasks.append(batchmodels.TaskAddParameter(\n",
+    "        id='TrainerTask',\n",
+    "        command_line=r'call C:\\\\prereq\\\\mount.bat && C:\\\\ProgramData\\\\Anaconda3\\\\Scripts\\\\activate.bat py36 && python -u Z:\\\\scripts_downpour\\\\manage.py runserver 0.0.0.0:80 data_dir=Z:\\\\\\\\ role=trainer experiment_name={0} batch_update_frequency={1} weights_path={2} train_conv_layers={3}'.format(job_id, batch_update_frequency, weights_path, train_conv_layers),\n",
+    "        display_name='Trainer',\n",
+    "        user_identity=batchmodels.UserIdentity(user_name=NOTEBOOK_CONFIG['batch_job_user_name']),\n",
+    "        multi_instance_settings = batchmodels.MultiInstanceSettings(number_of_instances=1, coordination_command_line='cls')\n",
+    "    ))\n",
+    "\n",
+    "# Agent tasks\n",
+    "agent_cmd_line = r'call C:\\\\prereq\\\\mount.bat && C:\\\\ProgramData\\\\Anaconda3\\\\Scripts\\\\activate.bat py36 && python -u Z:\\\\scripts_downpour\\\\app\\\\distributed_agent.py data_dir=Z: role=agent max_epoch_runtime_sec={0} per_iter_epsilon_reduction={1:f} min_epsilon={2:f} batch_size={3} replay_memory_size={4} experiment_name={5} weights_path={6} train_conv_layers={7}'.format(max_epoch_runtime_sec, per_iter_epsilon_reduction, min_epsilon, batch_size, replay_memory_size, job_id, weights_path, train_conv_layers) \n",
+    "for i in range(0, NOTEBOOK_CONFIG['batch_pool_size'] - 1, 1):\n",
+    "    tasks.append(batchmodels.TaskAddParameter(\n",
+    "            id='AgentTask_{0}'.format(i),\n",
+    "            command_line = agent_cmd_line,\n",
+    "            display_name='Agent_{0}'.format(i),\n",
+    "            user_identity=batchmodels.UserIdentity(user_name=NOTEBOOK_CONFIG['batch_job_user_name']),\n",
+    "            multi_instance_settings=batchmodels.MultiInstanceSettings(number_of_instances=1, coordination_command_line='cls')\n",
+    "        ))\n",
+    "    \n",
+    "batch_client.task.add_collection(job_id, tasks)\n",
+    "print('')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now the job has been kicked off! Shortly, you should see two new directories created on the file share:\n",
+    "\n",
+    "* **logs**: This contains the stdout for the agent and the trainer nodes. These streams are very useful for debugging. To add additional debug information, just print() to either stdout or stderr in the training code. \n",
+    "* **checkpoint**: This contains the trained models. After the required number of minibatches have been trained (as determined by the batch_update_frequency parameter), the model's weights will be saved to this directory on disk. \n",
+    "\n",
+    "In each of these folders, a subdirectory will be created with your experiment id. \n",
+    "\n",
+    "If you use remote desktop to connect to the agent machines, you will be able to see the training code drive the vehicle around (be sure to give administrator permission to run any powershell scripts when prompted).\n",
+    "\n",
+    "Training will continue indefinitely. Be sure to let the model train for at least 300,000 iterations. Once the model has trained, download the weights and move on to **Step 3 - Run the Model**."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/AirSimDistributedRL/ProvisionCluster.ps1 b/AirSimDistributedRL/ProvisionCluster.ps1
@@ -0,0 +1,13 @@
+Param(
+        [Parameter(Mandatory=$true)]
+        [String] $subscriptionId,
+        [Parameter(Mandatory=$true)]
+        [String] $resourceGroupName,
+        [Parameter(Mandatory=$true)]
+        [String] $batchAccountName
+)
+
+az login
+az account set --subscription $subscriptionId 
+az batch account set --resource-group $resourceGroupName --name $batchAccountName
+az batch pool create --json-file pool.json
diff --git a/AirSimDistributedRL/README.md b/AirSimDistributedRL/README.md
@@ -0,0 +1,71 @@
+# Distributed Deep Reinforcement Learning  for Autonomous Driving
+
+### Authors:
+
+**[Mitchell Spryn](https://www.linkedin.com/in/mitchell-spryn-57834545/)**, Software Engineer II, Microsoft
+
+**[Aditya Sharma](https://www.linkedin.com/in/adityasharmacmu/)**, Program Manager, Microsoft
+
+**[Dhawal Parkar](https://www.linkedin.com/in/dparkar/)**, Software Engineer II, Microsoft
+
+
+## Overview
+
+In this tutorial, you will learn how to train a distributed deep reinforcement learning model for autonomous driving leveraging the power of cloud computing. This tutorial serves as an introduction to training deep learning AD models at scale. Through the course of this tutorial you will learn how to set up a cluster of virtual machine nodes running the [AirSim simulation environment](https://github.com/Microsoft/AirSim) and then distribute a training job across the nodes to train a model to steer a car through the Neighborhood environment in AirSim using reinforcement learning. 
+
+The instructions provided here use virtual machines spun up on [Microsoft Azure](https://azure.microsoft.com/en-us/) using the [Azure Batch](https://azure.microsoft.com/en-us/services/batch/) service to schedule the distribution job. The ideas presented however, can be easily extended to the cloud platform and services of your choice. Please also note that you should be able to work through the tutorial without having to actually run the given code and train the model. **If you do wish to run the code, you will need an active [Azure subscription](https://azure.microsoft.com/en-us/free/), and kicking off the training job will [incur charges](https://azure.microsoft.com/en-us/pricing/).** 
+
+![car-driving](car_driving.gif)
+
+#### Who is this tutorial for?
+
+This tutorial was designed keeping autonomous driving practitioners in mind. Researchers as well as industry professionals working in the field will find this tutorial to be a good starting off point for further work. The focus of the tutorial is on teaching how to create autonomous driving models at scale from simulation data. While we use deep reinforcement learning to demonstrate how to train such models and the tutorial does get into model discussions, it assumes that readers are familiar with how reinforcement learning works. Beginners in the field, especially those who are new to deep learning, might find certain aspects of this tutorial challenging. Please refer to the Prerequisites section below for more details.
+
+## Prerequisites and setup
+
+#### Background needed
+
+This tutorial was designed with advanced users and practitioners in mind, hence it assumes the reader at the very least has a background in deep learning, and is familiar with the basic concepts of reinforcement learning (reward functions, episodes etc.). A helpful introduction to reinforcement learning can be found [here](https://medium.freecodecamp.org/deep-reinforcement-learning-where-to-start-291fb0058c01).
+
+It is also highly recommended that the reader has familiarity with the AirSim simulation platform. This tutorial build upon certain concepts introduced in our [end-to-end deep learning for autonomous driving](../AirSimE2EDeepLearning/README.md) tutorial. We therefore recommend going through that tutorial first.
+
+#### TODO: Environment Setup
+
+1. [Download the latest version of our Simulation Package](https://aka.ms/ADCookbookAirSimPackage). Consider using [AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy) instead of direct HTTPS download, as the file size is large.
+2. [Install Anaconda](https://conda.io/docs/user-guide/install/index.html) with Python 3.5 or higher.
+3. [Install CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine) or [install Tensorflow](https://www.tensorflow.org/install/install_windows)
+4. [Install h5py](http://docs.h5py.org/en/latest/build.html)
+5. [Install Keras](https://keras.io/#installation)
+6. [Configure Keras backend](https://keras.io/backend/) to work with TensorFlow (default) or CNTK.
+7. [Install AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy). Be sure to add the location for the AzCopy executable to your system path.
+8. [Install the latest verison of Azure Powershell](https://docs.microsoft.com/en-us/powershell/azure/install-azurerm-ps?view=azurermps-5.3.0).
+9. [Install the latest version of the Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest).
+
+#### Simulator Package
+
+We have created a standalone build of the AirSim simulation environment for the tutorials in this cookbook. [You can download the build package from here](https://aka.ms/ADCookbookAirSimPackage). Consider using [AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy), as the file size is large. After downloading the package, unzip it and run the powershell command
+
+`
+.\AD_Cookbook_Start_AirSim.ps1 neighborhood
+`
+
+to start the simulator in the neighborhood environmnet.
+
+#### Hardware
+
+This tutorial has been designed to run on Azure Batch using NV6 machines. Training times and charges vary depending on the number of machines that are spun up. Using a cluster size of 4 (i.e. 3 agent nodes and 1 trainer node), the model took 3 days to train from scratch. Using pretrained weights, the model trained in 6 hours. Using a large cluster size will result in a decreased training time, but will also incur additional charges. 
+
+To run the model, a machine with an NVIDIA GPU is required. This can either be an on-premise development box, or an NV-Series Azure Data Science VM. 
+
+## Structure of the tutorial
+
+You will follow a series of [Jupyter notebooks](https://jupyter-notebook.readthedocs.io/en/stable/index.html) as you make your way through this tutorial. Please start with the [first notebook to set up your cluster](SetupCluster.ipynb) and proceed through the notebooks in the following order:
+
+Step 0: [Set up the cluster](SetupCluster.ipynb)
+
+Step 1: [Explore the algorithm](ExploreAlgorithm.ipynb)
+
+Step 2: [Launch the training job](LaunchTrainingJob.ipynb)
+
+Step 3: [Run the model](RunModel.ipynb)
+