Skip to content

Commit

Permalink
Tutorial 11 (JAX): Explaining abbrev LDJ
Browse files Browse the repository at this point in the history
  • Loading branch information
phlippe committed Apr 2, 2023
1 parent 5f7828e commit fa80c4d
Showing 1 changed file with 5 additions and 33 deletions.
38 changes: 5 additions & 33 deletions docs/tutorial_notebooks/JAX/tutorial11/NF_image_modeling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@
"\n",
"<center width=\"100%\"><img src=\"../../tutorial11/uniform_flow.png\" width=\"300px\"></center>\n",
"\n",
"You can see that the height of $p(y)$ should be lower than $p(x)$ after scaling. This change in volume represents $\\left|\\frac{df(x)}{dx}\\right|$ in our equation above, and ensures that even after scaling, we still have a valid probability distribution. We can go on with making our function $f$ more complex. However, the more complex $f$ becomes, the harder it will be to find the inverse $f^{-1}$ of it, and to calculate the log-determinant of the Jacobian $\\log{} \\left|\\det \\frac{df(\\mathbf{x})}{d\\mathbf{x}}\\right|$. An easier trick to stack multiple invertible functions $f_{1,...,K}$ after each other, as all together, they still represent a single, invertible function. Using multiple, learnable invertible functions, a normalizing flow attempts to transform $p_z(z)$ slowly into a more complex distribution which should finally be $p_x(x)$. We visualize the idea below\n",
"You can see that the height of $p(y)$ should be lower than $p(x)$ after scaling. This change in volume represents $\\left|\\frac{df(x)}{dx}\\right|$ in our equation above, and ensures that even after scaling, we still have a valid probability distribution. We can go on with making our function $f$ more complex. However, the more complex $f$ becomes, the harder it will be to find the inverse $f^{-1}$ of it, and to calculate the log-determinant of the Jacobian $\\log{} \\left|\\det \\frac{df(\\mathbf{x})}{d\\mathbf{x}}\\right|$ (often abbreviated as *LDJ*). An easier trick to stack multiple invertible functions $f_{1,...,K}$ after each other, as all together, they still represent a single, invertible function. Using multiple, learnable invertible functions, a normalizing flow attempts to transform $p_z(z)$ slowly into a more complex distribution which should finally be $p_x(x)$. We visualize the idea below\n",
"(figure credit - [Lilian Weng](https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html)):\n",
"\n",
"<center width=\"100%\"><img src=\"../../tutorial11/normalizing_flow_layout.png\" width=\"700px\"></center>\n",
Expand Down Expand Up @@ -414,7 +414,8 @@
" return bpd, rng\n",
"\n",
" def encode(self, imgs, rng):\n",
" # Given a batch of images, return the latent representation z and ldj of the transformations\n",
" # Given a batch of images, return the latent representation z and \n",
" # log-determinant jacobian (ldj) of the transformations\n",
" z, ldj = imgs, jnp.zeros(imgs.shape[0])\n",
" for flow in self.flows:\n",
" z, ldj, rng = flow(z, ldj, rng, reverse=False)\n",
Expand Down Expand Up @@ -446,6 +447,7 @@
" z = z_init\n",
" \n",
" # Transform z to x by inverting the flows\n",
" # The log-determinant jacobian (ldj) is usually not of interest during sampling\n",
" ldj = jnp.zeros(img_shape[0])\n",
" for flow in reversed(self.flows):\n",
" z, ldj, rng = flow(z, ldj, rng, reverse=True)\n",
Expand Down Expand Up @@ -6712,7 +6714,7 @@
"\n",
"$$z'_{j+1:d} = \\mu_{\\theta}(z_{1:j}) + \\sigma_{\\theta}(z_{1:j}) \\odot z_{j+1:d}$$\n",
"\n",
"The functions $\\mu$ and $\\sigma$ are implemented as a shared neural network, and the sum and multiplication are performed element-wise. The LDJ is thereby the sum of the logs of the scaling factors: $\\sum_i \\left[\\log \\sigma_{\\theta}(z_{1:j})\\right]_i$. Inverting the layer can as simply be done as subtracting the bias and dividing by the scale: \n",
"The functions $\\mu$ and $\\sigma$ are implemented as a shared neural network, and the sum and multiplication are performed element-wise. The log-determinant Jacobian (LDJ) is thereby the sum of the logs of the scaling factors: $\\sum_i \\left[\\log \\sigma_{\\theta}(z_{1:j})\\right]_i$. Inverting the layer can as simply be done as subtracting the bias and dividing by the scale: \n",
"\n",
"$$z_{j+1:d} = \\left(z'_{j+1:d} - \\mu_{\\theta}(z_{1:j})\\right) / \\sigma_{\\theta}(z_{1:j})$$\n",
"\n",
Expand Down Expand Up @@ -8786,36 +8788,6 @@
" return self.nn(x)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Out (3, 32, 32, 18)\n"
]
}
],
"source": [
"## Test MultiheadAttention implementation\n",
"# Example features as input\n",
"main_rng, x_rng = random.split(main_rng)\n",
"x = random.normal(x_rng, (3, 32, 32, 16))\n",
"# Create attention\n",
"mh_attn = GatedConvNet(c_hidden=32, c_out=18, num_layers=3)\n",
"# Initialize parameters of attention with random key and inputs\n",
"main_rng, init_rng = random.split(main_rng)\n",
"params = mh_attn.init(init_rng, x)['params']\n",
"# Apply attention with parameters on the inputs\n",
"out = mh_attn.apply({'params': params}, x)\n",
"print('Out', out.shape)\n",
"\n",
"del mh_attn, params"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down

0 comments on commit fa80c4d

Please sign in to comment.