All models have outputs that are instances of subclasses of ModelOutput. Those are data structures containing all the information returned by the model, but that can also be used as tuples or dictionaries.
Let’s see how this looks in an example:
from transformers import BertTokenizer, BertForSequenceClassification import torch tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertForSequenceClassification.from_pretrained("google-bert/bert-base-uncased") inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") labels = torch.tensor([1]).unsqueeze(0) outputs = model(**inputs, labels=labels)
The outputs
object is a SequenceClassifierOutput, as we can see in the documentation of that class below, it means it has an optional loss
, a logits
, an optional hidden_states
and an optional attentions
attribute. Here we have the loss
since we passed along labels
, but we don’t have hidden_states
and attentions
because we didn’t pass output_hidden_states=True
or output_attentions=True
.
When passing output_hidden_states=True
you may expect the outputs.hidden_states[-1]
to match outputs.last_hidden_state
exactly. However, this is not always the case. Some models apply normalization or subsequent process to the last hidden state when it’s returned.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get None
. Here for instance outputs.loss
is the loss computed by the model, and outputs.attentions
is None
.
When considering our outputs
object as tuple, it only considers the attributes that don’t have None
values. Here for instance, it has two elements, loss
then logits
, so
will return the tuple (outputs.loss, outputs.logits)
for instance.
When considering our outputs
object as dictionary, it only considers the attributes that don’t have None
values. Here for instance, it has two keys that are loss
and logits
.
We document here the generic model outputs that are used by more than one model type. Specific output types are documented on their corresponding model page.
ModelOutput class transformers.utils.ModelOutput < source >( *args **kwargs )
Base class for all model outputs as dataclass. Has a __getitem__
that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None
attributes. Otherwise behaves like a regular python dictionary.
You can’t unpack a ModelOutput
directly. Use the to_tuple() method to convert it to a tuple before.
Convert self to a tuple containing all the attributes/keys that are not None
.
( last_hidden_state: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs, with potential hidden states and attentions.
BaseModelOutputWithPooling class transformers.modeling_outputs.BaseModelOutputWithPooling < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None pooler_output: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. torch.FloatTensor
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs that also contains a pooling of the last hidden states.
BaseModelOutputWithCrossAttentions class transformers.modeling_outputs.BaseModelOutputWithCrossAttentions < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
and config.add_cross_attention=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
Base class for model’s outputs, with potential hidden states and attentions.
BaseModelOutputWithPoolingAndCrossAttentions class transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None pooler_output: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. torch.FloatTensor
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
and config.add_cross_attention=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally if config.is_encoder_decoder=True
2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
Base class for model’s outputs that also contains a pooling of the last hidden states.
BaseModelOutputWithPast class transformers.modeling_outputs.BaseModelOutputWithPast < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally if config.is_encoder_decoder=True
2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).
BaseModelOutputWithPastAndCrossAttentions class transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentions < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally if config.is_encoder_decoder=True
2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
and config.add_cross_attention=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).
Seq2SeqModelOutput class transformers.modeling_outputs.Seq2SeqModelOutput < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model encoder’s outputs that also contains : pre-computed hidden states that can speed up sequential decoding.
CausalLMOutput class transformers.modeling_outputs.CausalLMOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Language modeling loss (for next-token prediction). torch.FloatTensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for causal language model (or autoregressive) outputs.
CausalLMOutputWithCrossAttentions class transformers.modeling_outputs.CausalLMOutputWithCrossAttentions < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Language modeling loss (for next-token prediction). torch.FloatTensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Cross attentions weights after the attention softmax, used to compute the weighted average in the cross-attention heads.
tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of torch.FloatTensor
tuples of length config.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant if config.is_decoder = True
.
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
Base class for causal language model (or autoregressive) outputs.
CausalLMOutputWithPast class transformers.modeling_outputs.CausalLMOutputWithPast < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Language modeling loss (for next-token prediction). torch.FloatTensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
)
Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for causal language model (or autoregressive) outputs.
MaskedLMOutput class transformers.modeling_outputs.MaskedLMOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Masked language modeling (MLM) loss. torch.FloatTensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for masked language models outputs.
Seq2SeqLMOutput class transformers.modeling_outputs.Seq2SeqLMOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Language modeling loss. torch.FloatTensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for sequence-to-sequence language models outputs.
NextSentencePredictorOutput class transformers.modeling_outputs.NextSentencePredictorOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when next_sentence_label
is provided) — Next sequence prediction (classification) loss. torch.FloatTensor
of shape (batch_size, 2)
) — Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of models predicting if two sentences are consecutive or not.
SequenceClassifierOutput class transformers.modeling_outputs.SequenceClassifierOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss. torch.FloatTensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sentence classification models.
Seq2SeqSequenceClassifierOutput class transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when label
is provided) — Classification (or regression if config.num_labels==1) loss. torch.FloatTensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sequence-to-sequence sentence classification models.
MultipleChoiceModelOutput class transformers.modeling_outputs.MultipleChoiceModelOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,), optional, returned when labels
is provided) — Classification loss. torch.FloatTensor
of shape (batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).
Classification scores (before SoftMax).
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of multiple choice models.
TokenClassifierOutput class transformers.modeling_outputs.TokenClassifierOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification loss. torch.FloatTensor
of shape (batch_size, sequence_length, config.num_labels)
) — Classification scores (before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of token classification models.
QuestionAnsweringModelOutput class transformers.modeling_outputs.QuestionAnsweringModelOutput < source >( loss: typing.Optional[torch.FloatTensor] = None start_logits: typing.Optional[torch.FloatTensor] = None end_logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. torch.FloatTensor
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax). torch.FloatTensor
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of question answering models.
Seq2SeqQuestionAnsweringModelOutput class transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput < source >( loss: typing.Optional[torch.FloatTensor] = None start_logits: typing.Optional[torch.FloatTensor] = None end_logits: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. torch.FloatTensor
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax). torch.FloatTensor
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax). tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sequence-to-sequence question answering models.
Seq2SeqSpectrogramOutput class transformers.modeling_outputs.Seq2SeqSpectrogramOutput < source >( loss: typing.Optional[torch.FloatTensor] = None spectrogram: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Spectrogram generation loss. torch.FloatTensor
of shape (batch_size, sequence_length, num_bins)
) — The predicted spectrogram. tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for sequence-to-sequence spectrogram outputs.
SemanticSegmenterOutput class transformers.modeling_outputs.SemanticSegmenterOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss. torch.FloatTensor
of shape (batch_size, config.num_labels, logits_height, logits_width)
) — Classification scores for each pixel.
The logits returned do not necessarily have the same size as the pixel_values
passed as inputs. This is to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the original image size as post-processing. You should always check your logits shape and resize as needed.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, patch_size, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, patch_size, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of semantic segmentation models.
ImageClassifierOutput class transformers.modeling_outputs.ImageClassifierOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss. torch.FloatTensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape (batch_size, sequence_length, hidden_size)
. Hidden-states (also called feature maps) of the model at the output of each stage. tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, patch_size, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of image classification models.
ImageClassifierOutputWithNoAttention class transformers.modeling_outputs.ImageClassifierOutputWithNoAttention < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss. torch.FloatTensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape (batch_size, num_channels, height, width)
. Hidden-states (also called feature maps) of the model at the output of each stage. Base class for outputs of image classification models.
DepthEstimatorOutput class transformers.modeling_outputs.DepthEstimatorOutput < source >( loss: typing.Optional[torch.FloatTensor] = None predicted_depth: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss. torch.FloatTensor
of shape (batch_size, height, width)
) — Predicted depth for each pixel. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, num_channels, height, width)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, patch_size, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of depth estimation models.
Wav2Vec2BaseModelOutput class transformers.modeling_outputs.Wav2Vec2BaseModelOutput < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None extract_features: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. torch.FloatTensor
of shape (batch_size, sequence_length, conv_dim[-1])
) — Sequence of extracted feature vectors of the last convolutional layer of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for models that have been trained with the Wav2Vec2 loss objective.
XVectorOutput class transformers.modeling_outputs.XVectorOutput < source >( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None embeddings: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification loss. torch.FloatTensor
of shape (batch_size, config.xvector_output_dim)
) — Classification hidden states before AMSoftmax. torch.FloatTensor
of shape (batch_size, config.xvector_output_dim)
) — Utterance embeddings used for vector similarity-based retrieval. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Output type of Wav2Vec2ForXVector.
Seq2SeqTSModelOutput class transformers.modeling_outputs.Seq2SeqTSModelOutput < source >( last_hidden_state: typing.Optional[torch.FloatTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None loc: typing.Optional[torch.FloatTensor] = None scale: typing.Optional[torch.FloatTensor] = None static_features: typing.Optional[torch.FloatTensor] = None )
Parameters
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
torch.FloatTensor
of shape (batch_size,)
or (batch_size, input_size)
, optional) — Shift values of each time series’ context window which is used to give the model inputs of the same magnitude and then used to shift back to the original magnitude. torch.FloatTensor
of shape (batch_size,)
or (batch_size, input_size)
, optional) — Scaling values of each time series’ context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude. torch.FloatTensor
of shape (batch_size, feature size)
, optional) — Static features of each time series’ in a batch which are copied to the covariates at inference time. Base class for time series model’s encoder outputs that also contains pre-computed hidden states that can speed up sequential decoding.
Seq2SeqTSPredictionOutput class transformers.modeling_outputs.Seq2SeqTSPredictionOutput < source >( loss: typing.Optional[torch.FloatTensor] = None params: typing.Optional[typing.Tuple[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None loc: typing.Optional[torch.FloatTensor] = None scale: typing.Optional[torch.FloatTensor] = None static_features: typing.Optional[torch.FloatTensor] = None )
Parameters
torch.FloatTensor
of shape (1,)
, optional, returned when a future_values
is provided) — Distributional loss. torch.FloatTensor
of shape (batch_size, num_samples, num_params)
) — Parameters of the chosen distribution. tuple(tuple(torch.FloatTensor))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(torch.FloatTensor)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
torch.FloatTensor
of shape (batch_size,)
or (batch_size, input_size)
, optional) — Shift values of each time series’ context window which is used to give the model inputs of the same magnitude and then used to shift back to the original magnitude. torch.FloatTensor
of shape (batch_size,)
or (batch_size, input_size)
, optional) — Scaling values of each time series’ context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude. torch.FloatTensor
of shape (batch_size, feature size)
, optional) — Static features of each time series’ in a batch which are copied to the covariates at inference time. Base class for time series model’s decoder outputs that also contain the loss as well as the parameters of the chosen distribution.
SampleTSPredictionOutput class transformers.modeling_outputs.SampleTSPredictionOutput < source >( sequences: typing.Optional[torch.FloatTensor] = None )
Parameters
torch.FloatTensor
of shape (batch_size, num_samples, prediction_length)
or (batch_size, num_samples, prediction_length, input_size)
) — Sampled values from the chosen distribution. Base class for time series model’s predictions outputs that contains the sampled values from the chosen distribution.
TFBaseModelOutput class transformers.modeling_tf_outputs.TFBaseModelOutput < source >( last_hidden_state: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. tuple(tf.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs, with potential hidden states and attentions.
TFBaseModelOutputWithPooling class transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling < source >( last_hidden_state: Optional[tf.Tensor] = None pooler_output: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. tf.Tensor
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
This output is usually not a good summary of the semantic content of the input, you’re often better with averaging or pooling the sequence of hidden-states for the whole input sequence.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs that also contains a pooling of the last hidden states.
TFBaseModelOutputWithPoolingAndCrossAttentions class transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions < source >( last_hidden_state: Optional[tf.Tensor] = None pooler_output: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. tf.Tensor
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
This output is usually not a good summary of the semantic content of the input, you’re often better with averaging or pooling the sequence of hidden-states for the whole input sequence.
List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
Base class for model’s outputs that also contains a pooling of the last hidden states.
TFBaseModelOutputWithPast class transformers.modeling_tf_outputs.TFBaseModelOutputWithPast < source >( last_hidden_state: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).
TFBaseModelOutputWithPastAndCrossAttentions class transformers.modeling_tf_outputs.TFBaseModelOutputWithPastAndCrossAttentions < source >( last_hidden_state: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).
TFSeq2SeqModelOutput class transformers.modeling_tf_outputs.TFSeq2SeqModelOutput < source >( last_hidden_state: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model encoder’s outputs that also contains : pre-computed hidden states that can speed up sequential decoding.
TFCausalLMOutput class transformers.modeling_tf_outputs.TFCausalLMOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of non-masked labels, returned when labels
is provided) — Language modeling loss (for next-token prediction). tf.Tensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for causal language model (or autoregressive) outputs.
TFCausalLMOutputWithCrossAttentions class transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of non-masked labels, returned when labels
is provided) — Language modeling loss (for next-token prediction). tf.Tensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
Base class for causal language model (or autoregressive) outputs.
TFCausalLMOutputWithPast class transformers.modeling_tf_outputs.TFCausalLMOutputWithPast < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of non-masked labels, returned when labels
is provided) — Language modeling loss (for next-token prediction). tf.Tensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for causal language model (or autoregressive) outputs.
TFMaskedLMOutput class transformers.modeling_tf_outputs.TFMaskedLMOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of non-masked labels, returned when labels
is provided) — Masked language modeling (MLM) loss. tf.Tensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for masked language models outputs.
TFSeq2SeqLMOutput class transformers.modeling_tf_outputs.TFSeq2SeqLMOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of non-masked labels, returned when labels
is provided) — Language modeling loss. tf.Tensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for sequence-to-sequence language models outputs.
TFNextSentencePredictorOutput class transformers.modeling_tf_outputs.TFNextSentencePredictorOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of non-masked labels, returned when next_sentence_label
is provided) — Next sentence prediction loss. tf.Tensor
of shape (batch_size, 2)
) — Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of models predicting if two sentences are consecutive or not.
TFSequenceClassifierOutput class transformers.modeling_tf_outputs.TFSequenceClassifierOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, )
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss. tf.Tensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sentence classification models.
TFSeq2SeqSequenceClassifierOutput class transformers.modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (1,)
, optional, returned when label
is provided) — Classification (or regression if config.num_labels==1) loss. tf.Tensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sequence-to-sequence sentence classification models.
TFMultipleChoiceModelOutput class transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, ), optional, returned when labels
is provided) — Classification loss. tf.Tensor
of shape (batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).
Classification scores (before SoftMax).
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of multiple choice models.
TFTokenClassifierOutput class transformers.modeling_tf_outputs.TFTokenClassifierOutput < source >( loss: tf.Tensor | None = None logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (n,)
, optional, where n is the number of unmasked labels, returned when labels
is provided) — Classification loss. tf.Tensor
of shape (batch_size, sequence_length, config.num_labels)
) — Classification scores (before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of token classification models.
TFQuestionAnsweringModelOutput class transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput < source >( loss: tf.Tensor | None = None start_logits: Optional[tf.Tensor] = None end_logits: Optional[tf.Tensor] = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (batch_size, )
, optional, returned when start_positions
and end_positions
are provided) — Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. tf.Tensor
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax). tf.Tensor
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax). tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of question answering models.
TFSeq2SeqQuestionAnsweringModelOutput class transformers.modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput < source >( loss: tf.Tensor | None = None start_logits: Optional[tf.Tensor] = None end_logits: Optional[tf.Tensor] = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
Parameters
tf.Tensor
of shape (1,)
, optional, returned when labels
is provided) — Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. tf.Tensor
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax). tf.Tensor
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax). List[tf.Tensor]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — List of tf.Tensor
of length config.n_layers
, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)
).
Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sequence-to-sequence question answering models.
FlaxBaseModelOutput class transformers.modeling_flax_outputs.FlaxBaseModelOutput < source >( last_hidden_state: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs, with potential hidden states and attentions.
“Returns a new object replacing the specified fields with new values.
FlaxBaseModelOutputWithPast class transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPast < source >( last_hidden_state: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Dict[str, jax.Array]] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. Dict[str, jnp.ndarray]
) — Dictionary of pre-computed hidden-states (key and values in the attention blocks) that can be used for fast auto-regressive decoding. Pre-computed key and value hidden-states are of shape [batch_size, max_length]. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs, with potential hidden states and attentions.
“Returns a new object replacing the specified fields with new values.
FlaxBaseModelOutputWithPooling class transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling < source >( last_hidden_state: typing.Optional[jax.Array] = None pooler_output: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model. jnp.ndarray
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model’s outputs that also contains a pooling of the last hidden states.
“Returns a new object replacing the specified fields with new values.
FlaxBaseModelOutputWithPastAndCrossAttentions class transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions < source >( last_hidden_state: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[jax.Array]]] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None cross_attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(jnp.ndarray)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally if config.is_encoder_decoder=True
2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if config.is_encoder_decoder=True
in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
and config.add_cross_attention=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
Base class for model’s outputs that may also contain a past key/values (to speed up sequential decoding).
“Returns a new object replacing the specified fields with new values.
FlaxSeq2SeqModelOutput class transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput < source >( last_hidden_state: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[jax.Array]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None decoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None cross_attentions: typing.Optional[typing.Tuple[jax.Array]] = None encoder_last_hidden_state: typing.Optional[jax.Array] = None encoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None encoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.
If past_key_values
is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size)
is output.
tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(jnp.ndarray)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for model encoder’s outputs that also contains : pre-computed hidden states that can speed up sequential decoding.
“Returns a new object replacing the specified fields with new values.
FlaxCausalLMOutputWithCrossAttentions class transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions < source >( logits: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[jax.Array]]] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None cross_attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Cross attentions weights after the attention softmax, used to compute the weighted average in the cross-attention heads.
tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of jnp.ndarray
tuples of length config.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant if config.is_decoder = True
.
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
Base class for causal language model (or autoregressive) outputs.
“Returns a new object replacing the specified fields with new values.
FlaxMaskedLMOutput class transformers.modeling_flax_outputs.FlaxMaskedLMOutput < source >( logits: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for masked language models outputs.
“Returns a new object replacing the specified fields with new values.
FlaxSeq2SeqLMOutput class transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput < source >( logits: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[jax.Array]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None decoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None cross_attentions: typing.Optional[typing.Tuple[jax.Array]] = None encoder_last_hidden_state: typing.Optional[jax.Array] = None encoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None encoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(jnp.ndarray)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for sequence-to-sequence language models outputs.
“Returns a new object replacing the specified fields with new values.
FlaxNextSentencePredictorOutput class transformers.modeling_flax_outputs.FlaxNextSentencePredictorOutput < source >( logits: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, 2)
) — Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax). tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of models predicting if two sentences are consecutive or not.
“Returns a new object replacing the specified fields with new values.
FlaxSequenceClassifierOutput class transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput < source >( logits: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sentence classification models.
“Returns a new object replacing the specified fields with new values.
FlaxSeq2SeqSequenceClassifierOutput class transformers.modeling_flax_outputs.FlaxSeq2SeqSequenceClassifierOutput < source >( logits: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[jax.Array]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None decoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None cross_attentions: typing.Optional[typing.Tuple[jax.Array]] = None encoder_last_hidden_state: typing.Optional[jax.Array] = None encoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None encoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax). tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(jnp.ndarray)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sequence-to-sequence sentence classification models.
“Returns a new object replacing the specified fields with new values.
FlaxMultipleChoiceModelOutput class transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput < source >( logits: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).
Classification scores (before SoftMax).
tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of multiple choice models.
“Returns a new object replacing the specified fields with new values.
FlaxTokenClassifierOutput class transformers.modeling_flax_outputs.FlaxTokenClassifierOutput < source >( logits: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length, config.num_labels)
) — Classification scores (before SoftMax). tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of token classification models.
“Returns a new object replacing the specified fields with new values.
FlaxQuestionAnsweringModelOutput class transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput < source >( start_logits: typing.Optional[jax.Array] = None end_logits: typing.Optional[jax.Array] = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax). jnp.ndarray
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax). tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of question answering models.
“Returns a new object replacing the specified fields with new values.
FlaxSeq2SeqQuestionAnsweringModelOutput class transformers.modeling_flax_outputs.FlaxSeq2SeqQuestionAnsweringModelOutput < source >( start_logits: typing.Optional[jax.Array] = None end_logits: typing.Optional[jax.Array] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[jax.Array]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None decoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None cross_attentions: typing.Optional[typing.Tuple[jax.Array]] = None encoder_last_hidden_state: typing.Optional[jax.Array] = None encoder_hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None encoder_attentions: typing.Optional[typing.Tuple[jax.Array]] = None )
Parameters
jnp.ndarray
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax). jnp.ndarray
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax). tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of tuple(jnp.ndarray)
of length config.n_layers
, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
, optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model. tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Base class for outputs of sequence-to-sequence question answering models.
“Returns a new object replacing the specified fields with new values.
< > Update on GitHubRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3