Bases: MessagePassing
The graph attentional operator from the “Graph Attention Networks” paper.
\[\mathbf{x}^{\prime}_i = \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \alpha_{i,j}\mathbf{\Theta}_t\mathbf{x}_{j},\]
where the attention coefficients \(\alpha_{i,j}\) are computed as
\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left( \mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i + \mathbf{a}^{\top}_{t} \mathbf{\Theta}_{t}\mathbf{x}_j \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left( \mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i + \mathbf{a}^{\top}_{t}\mathbf{\Theta}_{t}\mathbf{x}_k \right)\right)}.\]
If the graph has multi-dimensional edge features \(\mathbf{e}_{i,j}\), the attention coefficients \(\alpha_{i,j}\) are computed as
\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left( \mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i + \mathbf{a}^{\top}_{t} \mathbf{\Theta}_{t}\mathbf{x}_j + \mathbf{a}^{\top}_{e} \mathbf{\Theta}_{e} \mathbf{e}_{i,j} \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left( \mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i + \mathbf{a}^{\top}_{t} \mathbf{\Theta}_{t}\mathbf{x}_k + \mathbf{a}^{\top}_{e} \mathbf{\Theta}_{e} \mathbf{e}_{i,k} \right)\right)}.\]
If the graph is not bipartite, \(\mathbf{\Theta}_{s} = \mathbf{\Theta}_{t}\).
in_channels (int or tuple) – Size of each input sample, or -1
to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities in case of a bipartite graph.
out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default: 1
)
concat (bool, optional) – If set to False
, the multi-head attentions are averaged instead of concatenated. (default: True
)
negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2
)
dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0
)
add_self_loops (bool, optional) – If set to False
, will not add self-loops to the input graph. (default: True
)
edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default: None
)
fill_value (float or torch.Tensor or str, optional) – The way to generate edge features of self-loops (in case edge_dim != None
). If given as float
or torch.Tensor
, edge features of self-loops will be directly given by fill_value
. If given as str
, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add"
, "mean"
, "min"
, "max"
, "mul"
). (default: "mean"
)
bias (bool, optional) – If set to False
, the layer will not learn an additive bias. (default: True
)
residual (bool, optional) – If set to True
, the layer will add a learnable skip-connection. (default: False
)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing
.
input: node features \((|\mathcal{V}|, F_{in})\) or \(((|\mathcal{V_s}|, F_{s}), (|\mathcal{V_t}|, F_{t}))\) if bipartite, edge indices \((2, |\mathcal{E}|)\), edge features \((|\mathcal{E}|, D)\) (optional)
output: node features \((|\mathcal{V}|, H * F_{out})\) or \(((|\mathcal{V}_t|, H * F_{out})\) if bipartite. If return_attention_weights=True
, then \(((|\mathcal{V}|, H * F_{out}), ((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) or \(((|\mathcal{V_t}|, H * F_{out}), ((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) if bipartite
Runs the forward pass of the module.
x (torch.Tensor or (torch.Tensor, torch.Tensor)) – The input node features.
edge_index (torch.Tensor or SparseTensor) – The edge indices.
edge_attr (torch.Tensor, optional) – The edge features. (default: None
)
size ((int, int), optional) – The shape of the adjacency matrix. (default: None
)
return_attention_weights (bool, optional) – If set to True
, will additionally return the tuple (edge_index, attention_weights)
, holding the computed attention weights for each edge. (default: None
)
Union
[Tensor
, Tuple
[Tensor
, Tuple
[Tensor
, Tensor
]], Tuple
[Tensor
, SparseTensor
]]
Resets all learnable parameters of the module.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4