Fix typos in tutorial 2

emanbuc · Oct 26, 2020 · bf67598 · bf67598
1 parent c61ea98
commit bf67598
Showing 1 changed file with 8 additions and 8 deletions.
diff --git a/docs/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.ipynb b/docs/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.ipynb
@@ -20,7 +20,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Welcome to our PyTorch tutorial for the Deep Learning course 2020 at the University of Amsterdam! The following notebook is meant to give a short introduction to PyTorch basics, and get you setup for writing your own neural networks. PyTorch is an open source machine learning framework that allows you to write your own neural networks and optimize them efficiently. However, PyTorch is not the only framework of its kind. Alternatives to PyTorch include [TensorFlow](https://www.tensorflow.org/), [JAX](https://github.com/google/jax#quickstart-colab-in-the-cloud) and [Caffe](http://caffe.berkeleyvision.org/). We choose to teach PyTorch at the University of Amsterdam because it is well established, has a huge developer community (oroginally developed by Facebook), is very flexible and especially used in research. Many current papers publish their code in PyTorch, and thus it is good to be familiar with PyTorch as well. \n",
+    "Welcome to our PyTorch tutorial for the Deep Learning course 2020 at the University of Amsterdam! The following notebook is meant to give a short introduction to PyTorch basics, and get you setup for writing your own neural networks. PyTorch is an open source machine learning framework that allows you to write your own neural networks and optimize them efficiently. However, PyTorch is not the only framework of its kind. Alternatives to PyTorch include [TensorFlow](https://www.tensorflow.org/), [JAX](https://github.com/google/jax#quickstart-colab-in-the-cloud) and [Caffe](http://caffe.berkeleyvision.org/). We choose to teach PyTorch at the University of Amsterdam because it is well established, has a huge developer community (originally developed by Facebook), is very flexible and especially used in research. Many current papers publish their code in PyTorch, and thus it is good to be familiar with PyTorch as well. \n",
     "Meanwhile, TensorFlow (developed by Google) is usually known for being a production-grade deep learning library. Still, if you know one machine learning framework in depth, it is very easy to learn another one because many of them use the same concepts and ideas. For instance, TensorFlow's version 2 was heavily inspired by the most popular features of PyTorch, making the frameworks even more similar. \n",
     "If you are already familiar with PyTorch and have created your own neural network projects, feel free to just skim this notebook.\n",
     "\n",
@@ -87,7 +87,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "At the time of writing this tutorial (mid of October 2020), the current stable version is 1.6. You should therefore see the output `Using torch 1.6.0`. If you see a lower version number, make sure you have installed the correct the environment, or ask one of your TAs. In case PyTorch 1.7 or newer will be published during the time of the course, don't worry. The interface between PyTorch version doesn't change too much, and hence all code should also be runnable with newer versions.\n",
+    "At the time of writing this tutorial (mid of October 2020), the current stable version is 1.6. You should therefore see the output `Using torch 1.6.0`. If you see a lower version number, make sure you have installed the correct the environment, or ask one of your TAs. In case PyTorch 1.7 or newer will be published during the time of the course, don't worry. The interface between PyTorch versions doesn't change too much, and hence all code should also be runnable with newer versions.\n",
     "\n",
     "As in every machine learning framework, PyTorch provides functions that are stochastic like generating random numbers. However, a very good practice is to setup your code to be reproducible with the exact same random numbers. This is why we set a seed below. "
    ]
@@ -392,7 +392,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Other commonly used operations include matrix multiplications, which is essential for neural networks. Quite often, we have an input vector $\\mathbf{x}$, which is transformed using a learned weight matrix $\\mathbf{W}$. "
+    "Other commonly used operations include matrix multiplications, which are essential for neural networks. Quite often, we have an input vector $\\mathbf{x}$, which is transformed using a learned weight matrix $\\mathbf{W}$. "
    ]
   },
   {
@@ -560,7 +560,7 @@
     "\n",
     "One of the main reasons for using PyTorch in Deep Learning projects is that we can automatically get **gradients/derivatives** of functions that we define. We will mainly use PyTorch for implementing neural networks, and they are just fancy functions. If we use weight matrices in our function that we want to learn, then those are called the **parameters** or simply the **weights**.\n",
     "\n",
-    "If our neural ntwork would output a single scalar value, we would talk about taking the **derivative**, but you will see that quite often we will have **multiple** output variables (\"values\"); in that case we talk about **gradients**. It's a more general term.\n",
+    "If our neural network would output a single scalar value, we would talk about taking the **derivative**, but you will see that quite often we will have **multiple** output variables (\"values\"); in that case we talk about **gradients**. It's a more general term.\n",
     "\n",
     "Given an input $\\mathbf{x}$, we define our function by **manipulating** that input, usually by matrix-multiplications with weight matrices and additions with so-called bias vectors. As we manipulate our input, we are automatically creating a **computational graph**. This graph shows how to arrive at our output from our input. \n",
     "PyTorch is a **define-by-run** framework; this means that we can just do our manipulations, and PyTorch will keep track of that graph for us. Thus, we create a dynamic computation graph along the way.\n",
@@ -2287,7 +2287,7 @@
     "#### The data loader class\n",
     "\n",
     "The class `torch.utils.data.DataLoader` represents a Python iterable over a dataset with support for automatic batching, multi-process data loading and many more features. The data loader communicates with the dataset using the function `__getitem__`, and stacks its outputs as tensors over the first dimension to form a batch.\n",
-    "In contrast to the dataset class, we usually don't have to define our own data loader class, but can just create an object of it with the dataset as input. Additionally, we can configure our data loader with the following input argmuents (only a selection, see full list [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)):\n",
+    "In contrast to the dataset class, we usually don't have to define our own data loader class, but can just create an object of it with the dataset as input. Additionally, we can configure our data loader with the following input arguments (only a selection, see full list [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)):\n",
     "\n",
     "* `batch_size`: Number of samples to stack per batch\n",
     "* `shuffle`: If True, the data is returned in a random order. This is important during training for introducing stochasticity. \n",
@@ -2387,7 +2387,7 @@
    "source": [
     "#### Stochastic Gradient Descent\n",
     "\n",
-    "For updating the parameters, PyTorch provides the package `torch.optim` that has most popular optimizers implemented. We will discuss the specific optimizers and their difference later in the course, but will for now use the simplest of them: `torch.optim.SGD`. Stochastic Gradient Descent updates parameters by multiplying the gradients with a small constant, called learning rate, and subtracting those from the parameters (hence minimizing the loss). Therefore, we slowly move towards the direction of minimizing the loss. A good default value of the learning rate for a small network as ours is 0.1. "
+    "For updating the parameters, PyTorch provides the package `torch.optim` that has most popular optimizers implemented. We will discuss the specific optimizers and their differences later in the course, but will for now use the simplest of them: `torch.optim.SGD`. Stochastic Gradient Descent updates parameters by multiplying the gradients with a small constant, called learning rate, and subtracting those from the parameters (hence minimizing the loss). Therefore, we slowly move towards the direction of minimizing the loss. A good default value of the learning rate for a small network as ours is 0.1. "
    ]
   },
   {
@@ -2430,7 +2430,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, we can write a small training function. Remember our five steps: load a batch, obtain the predictions, calculate the loss, backpropagate, and update. Additionally, we have to push all data and model parameters to the device of our choice (GPU if available). For the tiny neural network we have, cummunicating the data to the GPU actually takes much more time than we could save from running the operation on GPU. For large networks, the communication time is significantly smaller than the actual runtime making a GPU crucial in these cases. Still, to practice, we will push the data to GPU here. \n",
+    "Now, we can write a small training function. Remember our five steps: load a batch, obtain the predictions, calculate the loss, backpropagate, and update. Additionally, we have to push all data and model parameters to the device of our choice (GPU if available). For the tiny neural network we have, communicating the data to the GPU actually takes much more time than we could save from running the operation on GPU. For large networks, the communication time is significantly smaller than the actual runtime making a GPU crucial in these cases. Still, to practice, we will push the data to GPU here. \n",
     "\n",
     "In addition, we set our model to training mode. This is done by calling `model.train()`. There exist certain modules that need to perform a different forward step during training than during testing (e.g. BatchNorm and Dropout), and we can switch between them using `model.train()` and `model.eval()`."
    ]
@@ -2507,7 +2507,7 @@
    "source": [
     "#### Saving a model\n",
     "\n",
-    "After finish training a model, we save the model to disk so that we can load the same weights at a later time. For this, we extract the so-called `state_dict` from the model which contains all learnable parameter. For our simple model, the state dict contains the following entries:"
+    "After finish training a model, we save the model to disk so that we can load the same weights at a later time. For this, we extract the so-called `state_dict` from the model which contains all learnable parameters. For our simple model, the state dict contains the following entries:"
    ]
   },
   {