( config: MobileViTConfig expand_output: bool = True *inputs **kwargs )
Parameters
The bare MobileViT model outputting raw hidden-states without any specific head on top. This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
TensorFlow models and layers in transformers
accept two formats as input:
The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit()
things should “just work” for you - just pass your inputs and labels in any format that model.fit()
supports! If, however, you want to use the second format outside of Keras methods like fit()
and predict()
, such as when creating your own layers or models with the Keras Functional
API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:
pixel_values
only and nothing else: model(pixel_values)
model([pixel_values, attention_mask])
or model([pixel_values, attention_mask, token_type_ids])
model({"pixel_values": pixel_values, "token_type_ids": token_type_ids})
Note that when creating models and layers with subclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!
call < source >( pixel_values: tf.Tensor | None = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None training: bool = False ) → transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling or tuple(tf.Tensor)
Parameters
np.ndarray
, tf.Tensor
, List[tf.Tensor]
, Dict[str, tf.Tensor]
or Dict[str, np.ndarray]
and each example must have the shape (batch_size, num_channels, height, width)
) — Pixel values. Pixel values can be obtained using AutoImageProcessor. See MobileViTImageProcessor.call() for details. bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the config will be used instead. bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. This argument can be used in eager mode, in graph mode the value will always be set to True. A transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling or a tuple of tf.Tensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (MobileViTConfig) and inputs.
last_hidden_state (tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
pooler_output (tf.Tensor
of shape (batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
This output is usually not a good summary of the semantic content of the input, you’re often better with averaging or pooling the sequence of hidden-states for the whole input sequence.
hidden_states (tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(tf.Tensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of tf.Tensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The TFMobileViTModel forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoImageProcessor, TFMobileViTModel >>> from datasets import load_dataset >>> dataset = load_dataset("huggingface/cats-image", trust_remote_code=True) >>> image = dataset["test"]["image"][0] >>> image_processor = AutoImageProcessor.from_pretrained("apple/mobilevit-small") >>> model = TFMobileViTModel.from_pretrained("apple/mobilevit-small") >>> inputs = image_processor(image, return_tensors="tf") >>> outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_state >>> list(last_hidden_states.shape) [1, 640, 8, 8]TFMobileViTForImageClassification class transformers.TFMobileViTForImageClassification < source >
( config: MobileViTConfig *inputs **kwargs )
Parameters
MobileViT model with an image classification head on top (a linear layer on top of the pooled features), e.g. for ImageNet.
This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
TensorFlow models and layers in transformers
accept two formats as input:
The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit()
things should “just work” for you - just pass your inputs and labels in any format that model.fit()
supports! If, however, you want to use the second format outside of Keras methods like fit()
and predict()
, such as when creating your own layers or models with the Keras Functional
API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:
pixel_values
only and nothing else: model(pixel_values)
model([pixel_values, attention_mask])
or model([pixel_values, attention_mask, token_type_ids])
model({"pixel_values": pixel_values, "token_type_ids": token_type_ids})
Note that when creating models and layers with subclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!
call < source >( pixel_values: tf.Tensor | None = None output_hidden_states: Optional[bool] = None labels: tf.Tensor | None = None return_dict: Optional[bool] = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFImageClassifierOutputWithNoAttention
or tuple(tf.Tensor)
Parameters
np.ndarray
, tf.Tensor
, List[tf.Tensor]
, Dict[str, tf.Tensor]
or Dict[str, np.ndarray]
and each example must have the shape (batch_size, num_channels, height, width)
) — Pixel values. Pixel values can be obtained using AutoImageProcessor. See MobileViTImageProcessor.call() for details. bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the config will be used instead. bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. This argument can be used in eager mode, in graph mode the value will always be set to True. tf.Tensor
of shape (batch_size,)
, optional) — Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]
. If config.num_labels == 1
a regression loss is computed (Mean-Square loss). If config.num_labels > 1
a classification loss is computed (Cross-Entropy). Returns
transformers.modeling_tf_outputs.TFImageClassifierOutputWithNoAttention
or tuple(tf.Tensor)
A transformers.modeling_tf_outputs.TFImageClassifierOutputWithNoAttention
or a tuple of tf.Tensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (MobileViTConfig) and inputs.
tf.Tensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss.tf.Tensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax).tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape (batch_size, num_channels, height, width)
. Hidden-states (also called feature maps) of the model at the output of each stage.The TFMobileViTForImageClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoImageProcessor, TFMobileViTForImageClassification >>> import tensorflow as tf >>> from datasets import load_dataset >>> dataset = load_dataset("huggingface/cats-image", trust_remote_code=True) >>> image = dataset["test"]["image"][0] >>> image_processor = AutoImageProcessor.from_pretrained("apple/mobilevit-small") >>> model = TFMobileViTForImageClassification.from_pretrained("apple/mobilevit-small") >>> inputs = image_processor(image, return_tensors="tf") >>> logits = model(**inputs).logits >>> >>> predicted_label = int(tf.math.argmax(logits, axis=-1)) >>> print(model.config.id2label[predicted_label]) tabby, tabby catTFMobileViTForSemanticSegmentation class transformers.TFMobileViTForSemanticSegmentation < source >
( config: MobileViTConfig **kwargs )
Parameters
MobileViT model with a semantic segmentation head on top, e.g. for Pascal VOC.
This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.
TensorFlow models and layers in transformers
accept two formats as input:
The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit()
things should “just work” for you - just pass your inputs and labels in any format that model.fit()
supports! If, however, you want to use the second format outside of Keras methods like fit()
and predict()
, such as when creating your own layers or models with the Keras Functional
API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:
pixel_values
only and nothing else: model(pixel_values)
model([pixel_values, attention_mask])
or model([pixel_values, attention_mask, token_type_ids])
model({"pixel_values": pixel_values, "token_type_ids": token_type_ids})
Note that when creating models and layers with subclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!
call < source >( pixel_values: tf.Tensor | None = None labels: tf.Tensor | None = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None training: bool = False ) → transformers.modeling_tf_outputs.TFSemanticSegmenterOutputWithNoAttention
or tuple(tf.Tensor)
Parameters
np.ndarray
, tf.Tensor
, List[tf.Tensor]
, Dict[str, tf.Tensor]
or Dict[str, np.ndarray]
and each example must have the shape (batch_size, num_channels, height, width)
) — Pixel values. Pixel values can be obtained using AutoImageProcessor. See MobileViTImageProcessor.call() for details. bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the config will be used instead. bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. This argument can be used in eager mode, in graph mode the value will always be set to True. tf.Tensor
of shape (batch_size, height, width)
, optional) — Ground truth semantic segmentation maps for computing the loss. Indices should be in [0, ..., config.num_labels - 1]
. If config.num_labels > 1
, a classification loss is computed (Cross-Entropy). Returns
transformers.modeling_tf_outputs.TFSemanticSegmenterOutputWithNoAttention
or tuple(tf.Tensor)
A transformers.modeling_tf_outputs.TFSemanticSegmenterOutputWithNoAttention
or a tuple of tf.Tensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (MobileViTConfig) and inputs.
loss (tf.Tensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss.
logits (tf.Tensor
of shape (batch_size, config.num_labels, logits_height, logits_width)
) — Classification scores for each pixel.
The logits returned do not necessarily have the same size as the pixel_values
passed as inputs. This is to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the original image size as post-processing. You should always check your logits shape and resize as needed.
hidden_states (tuple(tf.Tensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of tf.Tensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, patch_size, hidden_size)
.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
The TFMobileViTForSemanticSegmentation forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Examples:
>>> from transformers import AutoImageProcessor, TFMobileViTForSemanticSegmentation >>> from PIL import Image >>> import requests >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> image_processor = AutoImageProcessor.from_pretrained("apple/deeplabv3-mobilevit-small") >>> model = TFMobileViTForSemanticSegmentation.from_pretrained("apple/deeplabv3-mobilevit-small") >>> inputs = image_processor(images=image, return_tensors="tf") >>> outputs = model(**inputs) >>> >>> logits = outputs.logits
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3