( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
The bare BigBird Model transformer outputting raw hidden-states without any specific head on top.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. A transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
last_hidden_state (jnp.ndarray
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
pooler_output (jnp.ndarray
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdModel >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdModel.from_pretrained("google/bigbird-roberta-base") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="jax") >>> outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_stateFlaxBigBirdForPreTraining class transformers.FlaxBigBirdForPreTraining < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model with two heads on top as done during the pretraining: a masked language modeling
head and a next sentence prediction (classification)
head.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.models.big_bird.modeling_flax_big_bird.FlaxBigBirdForPreTrainingOutput
or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. Returns
transformers.models.big_bird.modeling_flax_big_bird.FlaxBigBirdForPreTrainingOutput
or tuple(torch.FloatTensor)
A transformers.models.big_bird.modeling_flax_big_bird.FlaxBigBirdForPreTrainingOutput
or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
prediction_logits (jnp.ndarray
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
seq_relationship_logits (jnp.ndarray
of shape (batch_size, 2)
) — Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForPreTraining >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np") >>> outputs = model(**inputs) >>> prediction_logits = outputs.prediction_logits >>> seq_relationship_logits = outputs.seq_relationship_logitsFlaxBigBirdForCausalLM class transformers.FlaxBigBirdForCausalLM < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model with a language modeling head on top (a linear layer on top of the hidden-states output) e.g for autoregressive tasks.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. A transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
logits (jnp.ndarray
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Cross attentions weights after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (tuple(tuple(jnp.ndarray))
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of jnp.ndarray
tuples of length config.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant if config.is_decoder = True
.
Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForCausalLM >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForCausalLM.from_pretrained("google/bigbird-roberta-base") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np") >>> outputs = model(**inputs) >>> >>> next_token_logits = outputs.logits[:, -1]FlaxBigBirdForMaskedLM class transformers.FlaxBigBirdForMaskedLM < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model with a language modeling
head on top.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.modeling_flax_outputs.FlaxMaskedLMOutput or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. A transformers.modeling_flax_outputs.FlaxMaskedLMOutput or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
logits (jnp.ndarray
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForMaskedLM >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base") >>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="jax") >>> outputs = model(**inputs) >>> logits = outputs.logitsFlaxBigBirdForSequenceClassification class transformers.FlaxBigBirdForSequenceClassification < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. A transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
logits (jnp.ndarray
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForSequenceClassification >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForSequenceClassification.from_pretrained("google/bigbird-roberta-base") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="jax") >>> outputs = model(**inputs) >>> logits = outputs.logitsFlaxBigBirdForMultipleChoice class transformers.FlaxBigBirdForMultipleChoice < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, num_choices, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, num_choices, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, num_choices, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, num_choices, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, num_choices, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. A transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
logits (jnp.ndarray
of shape (batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).
Classification scores (before SoftMax).
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForMultipleChoice >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForMultipleChoice.from_pretrained("google/bigbird-roberta-base") >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> choice0 = "It is eaten with a fork and a knife." >>> choice1 = "It is eaten while held in the hand." >>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="jax", padding=True) >>> outputs = model(**{k: v[None, :] for k, v in encoding.items()}) >>> logits = outputs.logitsFlaxBigBirdForTokenClassification class transformers.FlaxBigBirdForTokenClassification < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: dict = None ) → transformers.modeling_flax_outputs.FlaxTokenClassifierOutput or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. A transformers.modeling_flax_outputs.FlaxTokenClassifierOutput or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
logits (jnp.ndarray
of shape (batch_size, sequence_length, config.num_labels)
) — Classification scores (before SoftMax).
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdPreTrainedModel
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForTokenClassification >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForTokenClassification.from_pretrained("google/bigbird-roberta-base") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="jax") >>> outputs = model(**inputs) >>> logits = outputs.logitsFlaxBigBirdForQuestionAnswering class transformers.FlaxBigBirdForQuestionAnswering < source >
( config: BigBirdConfig input_shape: typing.Optional[tuple] = None seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )
Parameters
jax.numpy.dtype
, optional, defaults to jax.numpy.float32
) — The data type of the computation. Can be one of jax.numpy.float32
, jax.numpy.float16
(on GPUs) and jax.numpy.bfloat16
(on TPUs).
This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype
.
Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
BigBird Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits
and span end logits
).
This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)
This model is also a flax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.
Finally, this model supports inherent JAX features such as:
__call__ < source >( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None question_lengths = None params: dict = None dropout_rng: typing.Optional[PRNGKey] = None indices_rng: typing.Optional[PRNGKey] = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.big_bird.modeling_flax_big_bird.FlaxBigBirdForQuestionAnsweringModelOutput
or tuple(torch.FloatTensor)
Parameters
numpy.ndarray
of shape (batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary.
Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
numpy.ndarray
of shape (batch_size, sequence_length)
, optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1]
. numpy.ndarray
of shape (batch_size, sequence_length)
, optional) -- Mask to nullify selected heads of the attention modules. Mask values selected in
[0, 1]`:
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. Returns
transformers.models.big_bird.modeling_flax_big_bird.FlaxBigBirdForQuestionAnsweringModelOutput
or tuple(torch.FloatTensor)
A transformers.models.big_bird.modeling_flax_big_bird.FlaxBigBirdForQuestionAnsweringModelOutput
or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (BigBirdConfig) and inputs.
start_logits (jnp.ndarray
of shape (batch_size, sequence_length)
) — Span-start scores (before SoftMax).
end_logits (jnp.ndarray
of shape (batch_size, sequence_length)
) — Span-end scores (before SoftMax).
pooled_output (jnp.ndarray
of shape (batch_size, hidden_size)
) — pooled_output returned by FlaxBigBirdModel.
hidden_states (tuple(jnp.ndarray)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of jnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(jnp.ndarray)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of jnp.ndarray
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The FlaxBigBirdForQuestionAnswering forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoTokenizer, FlaxBigBirdForQuestionAnswering >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base") >>> model = FlaxBigBirdForQuestionAnswering.from_pretrained("google/bigbird-roberta-base") >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" >>> inputs = tokenizer(question, text, return_tensors="jax") >>> outputs = model(**inputs) >>> start_scores = outputs.start_logits >>> end_scores = outputs.end_logits
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3