Skip to content

Commit

Permalink
Fixed some small typos (#233)
Browse files Browse the repository at this point in the history
* Fix small typo

* Fix typo

* Fix more typos

* Fix typos in xai

* Fix typos in attention

* Run pre-commit
  • Loading branch information
RaulPPelaez committed Jan 16, 2023
1 parent 1f95ef0 commit 21138a6
Show file tree
Hide file tree
Showing 16 changed files with 32 additions and 27 deletions.
1 change: 1 addition & 0 deletions applied/QM9.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,7 @@
"node_feature_len = 16\n",
"msg_feature_len = 16\n",
"\n",
"\n",
"# make our weights\n",
"def init_weights(g, n, m):\n",
" we = np.random.normal(size=(n, m), scale=1e-1)\n",
Expand Down
1 change: 1 addition & 0 deletions dl/Equivariant.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,7 @@
"\n",
"def lift(f):\n",
" \"\"\"lift f into group\"\"\"\n",
"\n",
" # create new function from original\n",
" # that is f(gx_0)\n",
" @np_cache(maxsize=W**3)\n",
Expand Down
1 change: 0 additions & 1 deletion dl/Hyperparameter_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,6 @@
"def train_model(\n",
" model, lr=1e-3, Reduced_LR=False, Early_stop=False, batch_size=32, epochs=20\n",
"):\n",
"\n",
" tf.keras.backend.clear_session()\n",
" callbacks = []\n",
"\n",
Expand Down
1 change: 1 addition & 0 deletions dl/VAE.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -997,6 +997,7 @@
"source": [
"import numpy as np\n",
"\n",
"\n",
"###---------Transformation Functions----###\n",
"def center_com(paths):\n",
" \"\"\"Align paths to COM at each frame\"\"\"\n",
Expand Down
4 changes: 2 additions & 2 deletions dl/attention.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Attention Layers\n",
"\n",
"Attention is a concept in machine learning and AI that goes back many years, especially in computer vision{cite}`BALUJA1997329`. Like the word \"neural network\", attention was inspired by the idea of attention in how human brains deal with the massive amount of visual and audio input{cite}`treisman1980feature`. **Attention layers** are deep learning layers that evoke the idea of attention. You can read more about attention in deep learning in Luong et al. {cite}`luong2015effective` and get a practical [overview here](http://d2l.ai/chapter_attention-mechanisms/index.html). Attention layers have been empirically shown to be so effective in modeling sequences, like language, that they have become indispensible{cite}`vaswani2017attention`. The most common place you'll see attention layers is in [**transformer**](http://d2l.ai/chapter_attention-mechanisms/transformer.html) neural networks that model sequences. We'll also sometimes see attention in graph neural networks.\n",
"Attention is a concept in machine learning and AI that goes back many years, especially in computer vision{cite}`BALUJA1997329`. Like the word \"neural network\", attention was inspired by the idea of attention in how human brains deal with the massive amount of visual and audio input{cite}`treisman1980feature`. **Attention layers** are deep learning layers that evoke the idea of attention. You can read more about attention in deep learning in Luong et al. {cite}`luong2015effective` and get a practical [overview here](http://d2l.ai/chapter_attention-mechanisms/index.html). Attention layers have been empirically shown to be so effective in modeling sequences, like language, that they have become indispensable{cite}`vaswani2017attention`. The most common place you'll see attention layers is in [**transformer**](http://d2l.ai/chapter_attention-mechanisms/transformer.html) neural networks that model sequences. We'll also sometimes see attention in graph neural networks.\n",
"\n",
"\n",
"```{margin}\n",
Expand Down Expand Up @@ -89,7 +89,7 @@
"source": [
"## Attention Mechanism Equation\n",
"\n",
"The attention mechanism equation uses query and keys arguments only. It outputs a tensor one rank less than the keys, giving a scalar for each key corresponding to the attention the query should have for the key. This attention vector should be normalized. The most common attention mechanism a dot product and softmax:\n",
"The attention mechanism equation uses query and keys arguments only. It outputs a tensor one rank less than the keys, giving a scalar for each key corresponding to the attention the query should have for the key. This attention vector should be normalized. The most common attention mechanism is a dot product and softmax:\n",
"\n",
"\\begin{equation}\n",
"\\vec{b} = \\mathrm{softmax}\\left(\\vec{q}\\cdot \\mathbf{K}\\right) = \\mathrm{softmax}\\left(\\sum_j q_j k_{ij}\\right)\n",
Expand Down
2 changes: 1 addition & 1 deletion dl/data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -749,7 +749,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can see how points far away on the chain from 0 have much more variance in the point 0 align, whereas the COM alignment looks better spread. Remember, to apply these methods you must do them to your both your training data and any prediction points. Thus, they should be viewed as part of your neural network. We can now check that rotating has no effect on these. The plots below have the trajectory rotated by 1 radian and you can see that both alignment methods have no change (the lines are overlapping)."
"You can see how points far away on the chain from 0 have much more variance in the point 0 align, whereas the COM alignment looks better spread. Remember, to apply these methods you must do them to both your training data and any prediction points. Thus, they should be viewed as part of your neural network. We can now check that rotating has no effect on these. The plots below have the trajectory rotated by 1 radian and you can see that both alignment methods have no change (the lines are overlapping)."
]
},
{
Expand Down
2 changes: 2 additions & 0 deletions dl/flows.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,8 @@
"# use input (feature) and output (log prob)\n",
"# to make model\n",
"model = tf.keras.Model(x, log_prob)\n",
"\n",
"\n",
"# define a loss\n",
"def neg_loglik(yhat, log_prob):\n",
" # losses always take in label, prediction\n",
Expand Down
7 changes: 4 additions & 3 deletions dl/gnn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1053,7 +1053,7 @@
"A common piece of wisdom is if you want to solve a real problem with deep learning, you should read the most recent popular paper in an area and use the baseline they compare against instead of their proposed model. The reason is that a baseline model usually must be easy, fast, and well-tested, which is generally more important than being the most accurate\n",
"```\n",
"\n",
"SchNet is for atoms represented as xyz coordinates (points) -- not as a molecular graph. All our previous examples used the underlying molecular graph as the input. In SchNet we will convert our xyz coodinates into a graph, so that we can apply a GNNN. SchNet was developed for predicting energies and forces from atom configurations without bond information. Thus, we need to first see how a set of atoms and their positions is converted into a graph. To get the nodes, we do a similar process as above and the atomic number is passed through an embedding layer, which is just means we assign a trainable vector to each atomic number (See {doc}`layers` for a review of embeddings). \n",
"SchNet is for atoms represented as xyz coordinates (points) -- not as a molecular graph. All our previous examples used the underlying molecular graph as the input. In SchNet we will convert our xyz coodinates into a graph, so that we can apply a GNN. SchNet was developed for predicting energies and forces from atom configurations without bond information. Thus, we need to first see how a set of atoms and their positions is converted into a graph. To get the nodes, we do a similar process as above and the atomic number is passed through an embedding layer, which just means we assign a trainable vector to each atomic number (See {doc}`layers` for a review of embeddings). \n",
"\n",
"Getting the adjacency matrix is simple too: we just make every atom be connected to every atom. It might seem confusing what the point of using a GNN is, if we're just connecting everything. *It is because GNNs are permutation equivariant.* If we tried to do learning on the atoms as xyz coordinates, we would have weights depending on the ordering of atoms and probably fail to handle different numbers of atoms.\n",
"\n",
Expand Down Expand Up @@ -1220,6 +1220,7 @@
"\n",
"label_str = list(set([k.split(\"-\")[0] for k in trajs]))\n",
"\n",
"\n",
"# now build dataset\n",
"def generator():\n",
" for k, v in trajs.items():\n",
Expand Down Expand Up @@ -1553,7 +1554,7 @@
"\n",
"---\n",
"\n",
"Let's give now use the model on some data."
"Let's now use the model on some data."
]
},
{
Expand Down Expand Up @@ -1680,7 +1681,7 @@
"\n",
"### Common Architecture Motifs and Comparisons\n",
"\n",
"We've now seen message passing layer GNNs, GCNs, GGNs, and the generalized Battaglia equations. You'll find common motifs in the architectures, like gating, {doc}`attention`, and pooling strategies. For example, Gated GNNS (GGNs) can be combined with attention pooling to create Gated Attention GNNs (GAANs){cite}`zhang2018gaan`. GraphSAGE is a similar to a GCN but it samples when pooling, making the neighbor-updates of fixed dimension{cite}`hamilton2017inductive`. So you'll see the suffix \"sage\" when you sample over neighbors while pooling. These can all be represented in the Battaglia equations, but you should be aware of these names. \n",
"We've now seen message passing layer GNNs, GCNs, GGNs, and the generalized Battaglia equations. You'll find common motifs in the architectures, like gating, {doc}`attention`, and pooling strategies. For example, Gated GNNS (GGNs) can be combined with attention pooling to create Gated Attention GNNs (GAANs){cite}`zhang2018gaan`. GraphSAGE is similar to a GCN but it samples when pooling, making the neighbor-updates of fixed dimension{cite}`hamilton2017inductive`. So you'll see the suffix \"sage\" when you sample over neighbors while pooling. These can all be represented in the Battaglia equations, but you should be aware of these names. \n",
"\n",
"The enormous variety of architectures has led to work on identifying the \"best\" or most general GNN architecture {cite}`dwivedi2020benchmarking,errica2019fair,shchur2018pitfalls`. Unfortunately, the question of which GNN architecture is best is as difficult as \"what benchmark problems are best?\" Thus there are no agreed-upon conclusions on the best architecture. However, those papers are great resources on training, hyperparameters, and reasonable starting guesses and I highly recommend reading them before designing your own GNN. There has been some theoretical work to show that simple architectures, like GCNs, cannot distinguish between certain simple graphs {cite}`xu2018powerful`. How much this practically matters depends on your data. Ultimately, there is so much variety in hyperparameters, data equivariances, and training decisions that you should think carefully about how much the GNN architecture matters before exploring it with too much depth. "
]
Expand Down
2 changes: 1 addition & 1 deletion dl/layers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@
"\n",
"#### Layer Normalization\n",
"\n",
"Batch normalization depends on there being a constant batch size. Some kinds of data, like text or a graphs, have different sizes and so the batch mean/variance can change significantly. **Layer normalization** avoids this problem by normalizing across the *features* (the non-batch axis/channel axis) instead of the batch. This has a similar effect of making the layer output features behave well-centered at 0 but without having highly variable means/variances because of batch to batch variation. You'll see these in graph neural networks and recurrent neural networks, with both take variable sized inputs. \n",
"Batch normalization depends on there being a constant batch size. Some kinds of data, like text or graphs, have different sizes and so the batch mean/variance can change significantly. **Layer normalization** avoids this problem by normalizing across the *features* (the non-batch axis/channel axis) instead of the batch. This has a similar effect of making the layer output features behave well-centered at 0 but without having highly variable means/variances because of batch to batch variation. You'll see these in graph neural networks and recurrent neural networks, with both take variable sized inputs. \n",
"\n",
"### Dropout\n",
"\n",
Expand Down
Loading

0 comments on commit 21138a6

Please sign in to comment.