import torch
from torch import nn
Attention Mechanism
def addition(a,b):
"Adds two numbers together"
return a+b
We take the sum of a and b, which is written in python with the “+” symbol
A
A ()
Initialize self. See help(type(self)) for accurate signature.
from fastcore.foundation import docs
@docs
class PrepareForMultiHeadAttention(nn.Module):
def __init__(self, d_model, heads, d_k, bias):
super().__init__()
self.linear = nn.Linear(d_model, heads * d_k, bias=bias)
self.heads = heads
self.d_k = d_k
def forward(self, x):
= x.shape[:-1]
head_shape
= self.linear(x)
x = x.view(*head_shape, self.heads, self.d_k)
x
return x
= dict(cls_doc="xxx",
_docs ="yyy") forward
@docs
class PrepareForMultiHeadAttention(nn.Module):
def __init__(self, d_model, heads, d_k, bias):
super().__init__()
self.linear = nn.Linear(d_model, heads * d_k, bias=bias)
self.heads = heads
self.d_k = d_k
def forward(self, x):
= x.shape[:-1]
head_shape
= self.linear(x)
x = x.view(*head_shape, self.heads, self.d_k)
x
return x
= dict(cls_doc="xxx",
_docs ="yyy") forward
self.heads = heads
dasdasd
Multi-head Attention
In practice, we don’t compute each attention score at once, but we concentrate all the key
to one matrix, same for value
and query
. Then calculate all the attention score of these at once.
\[\operatorname{Attention}(Q, K, V)=\underset{\text { seq }}{\operatorname{softmax}}\left(\frac{Q K^{\top}}{\sqrt{d_k}}\right) V\]
d_model
: the number of features inquery
,key
, andvalue
vectors.heads
: the number of attention layers.d_k
: the number of dimension of each vector in each head
class MultiHeadAttention(nn.Module):
def __init__(
self,
int,
heads: int,
d_model: float=0.1,
dropout_prop: bool = True
bias:
):super().__init__()
self.d_k = d_model // heads
self.heads = heads
self.query = PrepareForMultiHeadAttention(d_model, heads, self.d_k, bias)
MultiHeadAttention
MultiHeadAttention (heads:int, d_model:int, dropout_prop:float=0.1, bias:bool=True)
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool