A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://huggingface.co/docs/transformers/v4.51.3/en/model_doc/flava below:

Website Navigation


FLAVA

FLAVA Overview

The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

The abstract from the paper is the following:

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-modal (contrastive) or multi-modal (with earlier fusion) but not both; and they often only target specific modalities or tasks. A promising direction would be to use a single holistic universal model, as a “foundation”, that targets all modalities at once — a true vision and language foundation model should be good at vision tasks, language tasks, and cross- and multi-modal vision and language tasks. We introduce FLAVA as such a model and demonstrate impressive performance on a wide range of 35 tasks spanning these target modalities.

This model was contributed by aps. The original code can be found here.

FlavaConfig class transformers.FlavaConfig < source >

( image_config: typing.Dict[str, typing.Any] = None text_config: typing.Dict[str, typing.Any] = None multimodal_config: typing.Dict[str, typing.Any] = None image_codebook_config: typing.Dict[str, typing.Any] = None hidden_size: int = 768 layer_norm_eps: float = 1e-12 projection_dim: int = 768 init_codebook: bool = True logit_scale_init_value: float = 2.6592 initializer_range: float = 0.02 ce_ignore_index: int = -100 mim_weight: float = 1.0 mlm_weight: float = 1.0 global_contrastive_weight: float = 1.0 itm_weight: float = 1.0 mmm_image_weight: float = 1.0 mmm_text_weight: float = 1.0 global_backprop_contrastive: bool = True skip_unmasked_multimodal_encoder: bool = True return_loss: bool = True **kwargs )

Parameters

FlavaConfig is the configuration class to store the configuration of a FlavaModel. It is used to instantiate FLAVA model according to the specified arguments, defining the text model, image model, image codebook and multimodal model configs. Instantiating a configuration with the defaults will yield a similar configuration to that of the FLAVA facebook/flava-full architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

>>> from transformers import FlavaConfig, FlavaModel, FlavaForPreTraining

>>> 
>>> configuration = FlavaConfig()

>>> 
>>> model = FlavaModel(configuration)
>>> model_pre = FlavaForPreTraining(configuration)

>>> 
>>> configuration = model.config
>>> configuration_pre = model_pre.config
from_configs < source >

( image_config: FlavaImageConfig text_config: FlavaTextConfig multimodal_config: FlavaMultimodalConfig image_codebook_config: FlavaImageCodebookConfig **kwargs ) FlavaConfig

An instance of a configuration object

Instantiate a FlavaConfig (or a derived class) from flava text model configuration, flava image model configuration, flava multimodal model and flava codebook model configuration.

FlavaTextConfig class transformers.FlavaTextConfig < source >

( vocab_size: int = 30522 type_vocab_size: int = 2 max_position_embeddings: int = 512 position_embedding_type: str = 'absolute' hidden_size: int = 768 num_hidden_layers: int = 12 num_attention_heads: int = 12 intermediate_size: int = 3072 hidden_act: str = 'gelu' hidden_dropout_prob: float = 0.0 attention_probs_dropout_prob: float = 0.0 initializer_range: float = 0.02 layer_norm_eps: float = 1e-12 pad_token_id: int = 0 qkv_bias: bool = True **kwargs )

Parameters

This is the configuration class to store the configuration of a FlavaTextModel. It is used to instantiate an FLAVA model according to the specified arguments, defining the model architecture.

Instantiating a configuration with the defaults will yield a similar configuration to that of the FLAVA facebook/flava-full architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

>>> from transformers import FlavaTextConfig, FlavaTextModel

>>> 
>>> configuration = FlavaTextConfig()

>>> 
>>> model = FlavaTextModel(configuration)

>>> 
>>> configuration = model.config
FlavaImageConfig class transformers.FlavaImageConfig < source >

( hidden_size: int = 768 num_hidden_layers: int = 12 num_attention_heads: int = 12 intermediate_size: int = 3072 hidden_act: int = 'gelu' hidden_dropout_prob: float = 0.0 attention_probs_dropout_prob: float = 0.0 initializer_range: float = 0.02 layer_norm_eps: float = 1e-12 image_size: int = 224 patch_size: int = 16 num_channels: int = 3 qkv_bias: bool = True mask_token: bool = True vocab_size: int = 8192 **kwargs )

Parameters

This is the configuration class to store the configuration of a FlavaImageModel. It is used to instantiate an FLAVA model according to the specified arguments, defining the model architecture.

Instantiating a configuration with the defaults will yield a similar configuration to that of the FLAVA facebook/flava-full architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

>>> from transformers import FlavaImageConfig, FlavaImageModel

>>> 
>>> configuration = FlavaImageConfig()

>>> 
>>> model = FlavaImageModel(configuration)

>>> 
>>> configuration = model.config
FlavaMultimodalConfig class transformers.FlavaMultimodalConfig < source >

( hidden_size: int = 768 num_hidden_layers: int = 6 num_attention_heads: int = 12 intermediate_size: int = 3072 hidden_act: int = 'gelu' hidden_dropout_prob: int = 0.0 attention_probs_dropout_prob: int = 0.0 initializer_range: float = 0.02 layer_norm_eps: float = 1e-12 qkv_bias: bool = True use_cls_token: bool = True **kwargs )

Parameters

This is the configuration class to store the configuration of a FlavaMultimodalModel. It is used to instantiate an FLAVA model according to the specified arguments, defining the model architecture.

Instantiating a configuration with the defaults will yield a similar configuration to that of the FLAVA facebook/flava-full architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

>>> from transformers import FlavaMultimodalConfig, FlavaMultimodalModel

>>> 
>>> configuration = FlavaMultimodalConfig()

>>> 
>>> model = FlavaMultimodalModel(configuration)

>>> 
>>> configuration = model.config
FlavaImageCodebookConfig class transformers.FlavaImageCodebookConfig < source >

( num_groups: int = 4 input_channels: int = 3 num_blocks_per_group: int = 2 hidden_size: int = 256 vocab_size: int = 8192 freeze: int = True initializer_range: float = 0.02 **kwargs )

FlavaProcessor class transformers.FlavaProcessor < source >

( image_processor = None tokenizer = None **kwargs )

Constructs a FLAVA processor which wraps a FLAVA image processor and a FLAVA tokenizer into a single processor.

FlavaProcessor offers all the functionalities of FlavaImageProcessor and BertTokenizerFast. See the __call__() and decode() for more information.

This method forwards all its arguments to BertTokenizerFast’s batch_decode(). Please refer to the docstring of this method for more information.

This method forwards all its arguments to BertTokenizerFast’s decode(). Please refer to the docstring of this method for more information.

FlavaFeatureExtractor FlavaImageProcessor class transformers.FlavaImageProcessor < source >

( do_resize: bool = True size: typing.Dict[str, int] = None resample: Resampling = <Resampling.BICUBIC: 3> do_center_crop: bool = True crop_size: typing.Dict[str, int] = None do_rescale: bool = True rescale_factor: typing.Union[int, float] = 0.00392156862745098 do_normalize: bool = True image_mean: typing.Union[float, typing.Iterable[float], NoneType] = None image_std: typing.Union[float, typing.Iterable[float], NoneType] = None return_image_mask: bool = False input_size_patches: int = 14 total_mask_patches: int = 75 mask_group_min_patches: int = 16 mask_group_max_patches: typing.Optional[int] = None mask_group_min_aspect_ratio: float = 0.3 mask_group_max_aspect_ratio: typing.Optional[float] = None return_codebook_pixels: bool = False codebook_do_resize: bool = True codebook_size: typing.Optional[bool] = None codebook_resample: int = <Resampling.LANCZOS: 1> codebook_do_center_crop: bool = True codebook_crop_size: typing.Optional[int] = None codebook_do_rescale: bool = True codebook_rescale_factor: typing.Union[int, float] = 0.00392156862745098 codebook_do_map_pixels: bool = True codebook_do_normalize: bool = True codebook_image_mean: typing.Union[float, typing.Iterable[float], NoneType] = None codebook_image_std: typing.Union[float, typing.Iterable[float], NoneType] = None **kwargs )

Parameters

Constructs a Flava image processor.

preprocess < source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] do_resize: typing.Optional[bool] = None size: typing.Dict[str, int] = None resample: Resampling = None do_center_crop: typing.Optional[bool] = None crop_size: typing.Optional[typing.Dict[str, int]] = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_normalize: typing.Optional[bool] = None image_mean: typing.Union[float, typing.List[float], NoneType] = None image_std: typing.Union[float, typing.List[float], NoneType] = None return_image_mask: typing.Optional[bool] = None input_size_patches: typing.Optional[int] = None total_mask_patches: typing.Optional[int] = None mask_group_min_patches: typing.Optional[int] = None mask_group_max_patches: typing.Optional[int] = None mask_group_min_aspect_ratio: typing.Optional[float] = None mask_group_max_aspect_ratio: typing.Optional[float] = None return_codebook_pixels: typing.Optional[bool] = None codebook_do_resize: typing.Optional[bool] = None codebook_size: typing.Optional[typing.Dict[str, int]] = None codebook_resample: typing.Optional[int] = None codebook_do_center_crop: typing.Optional[bool] = None codebook_crop_size: typing.Optional[typing.Dict[str, int]] = None codebook_do_rescale: typing.Optional[bool] = None codebook_rescale_factor: typing.Optional[float] = None codebook_do_map_pixels: typing.Optional[bool] = None codebook_do_normalize: typing.Optional[bool] = None codebook_image_mean: typing.Optional[typing.Iterable[float]] = None codebook_image_std: typing.Optional[typing.Iterable[float]] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None )

Parameters

Preprocess an image or batch of images.

FlavaForPreTraining class transformers.FlavaForPreTraining < source >

( config: FlavaConfig image_codebook: typing.Optional[torch.nn.modules.module.Module] = None )

Parameters

The FLAVA model for pretraining which outputs losses, embeddings, logits and transformer outputs.

This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_ids: typing.Optional[torch.LongTensor] = None input_ids_masked: typing.Optional[torch.LongTensor] = None pixel_values: typing.Optional[torch.FloatTensor] = None codebook_pixel_values: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None bool_masked_pos: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.LongTensor] = None image_attention_mask: typing.Optional[torch.Tensor] = None skip_unmasked_multimodal_encoder: typing.Optional[bool] = None mlm_labels: typing.Optional[torch.Tensor] = None mim_labels: typing.Optional[torch.Tensor] = None itm_labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: bool = True return_dict: typing.Optional[bool] = None return_loss: typing.Optional[bool] = None ) transformers.models.flava.modeling_flava.FlavaForPreTrainingOutput or tuple(torch.FloatTensor)

Parameters

Returns

transformers.models.flava.modeling_flava.FlavaForPreTrainingOutput or tuple(torch.FloatTensor)

A transformers.models.flava.modeling_flava.FlavaForPreTrainingOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (<class 'transformers.models.flava.configuration_flava.FlavaConfig'>) and inputs.

The FlavaForPreTraining forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

FlavaModel class transformers.FlavaModel < source >

( config: FlavaConfig )

Parameters

The bare FLAVA Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_ids: typing.Optional[torch.LongTensor] = None pixel_values: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None bool_masked_pos: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.LongTensor] = None image_attention_mask: typing.Optional[torch.Tensor] = None skip_multimodal_encoder: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: bool = True return_dict: typing.Optional[bool] = None ) transformers.models.flava.modeling_flava.FlavaModelOutput or tuple(torch.FloatTensor)

Parameters

Returns

transformers.models.flava.modeling_flava.FlavaModelOutput or tuple(torch.FloatTensor)

A transformers.models.flava.modeling_flava.FlavaModelOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (<class 'transformers.models.flava.configuration_flava.FlavaConfig'>) and inputs.

The FlavaModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

>>> from PIL import Image
>>> import requests
>>> from transformers import AutoProcessor, FlavaModel

>>> model = FlavaModel.from_pretrained("facebook/flava-full")
>>> processor = AutoProcessor.from_pretrained("facebook/flava-full")

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> inputs = processor(text=["a photo of a cat"], images=image, return_tensors="pt", padding=True)

>>> outputs = model(**inputs)

>>> image_embeddings = outputs.image_embeddings
>>> text_embeddings = outputs.text_embeddings
>>> multimodal_embeddings = outputs.multimodal_embeddings

>>> outputs.image_embeddings.shape
torch.Size([1, 197, 768])

>>> text_embeddings.shape
torch.Size([1, 7, 768])

>>> multimodal_embeddings.shape
torch.Size([1, 205, 768])
get_text_features < source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None )

Parameters

The FlavaModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

get_image_features < source >

( pixel_values: typing.Optional[torch.Tensor] = None bool_masked_pos: typing.Optional[torch.BoolTensor] = None interpolate_pos_encoding: typing.Optional[bool] = None attention_mask: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None )

Parameters

The FlavaModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

FlavaImageCodebook class transformers.FlavaImageCodebook < source >

( config: FlavaImageCodebookConfig **kwargs: typing.Any )

Parameters

The FLAVA’s image codebook model inspired from DALL-E’s original encoder. Outputs raw hidden states and can be used to generate image tokens for an image based on DALL-E’s vocab. Used to generate labels for MIM. Use get_codebook_indices to get image tokens for an image.

This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

get_codebook_indices < source >

( pixel_values: Tensor )

get_codebook_probs < source >

( pixel_values: Tensor )

FlavaTextModel class transformers.FlavaTextModel < source >

( config: FlavaTextConfig add_pooling_layer: bool = True )

Parameters

The bare FLAVA Text Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) transformers.modeling_outputs.BaseModelOutputWithPooling or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.BaseModelOutputWithPooling or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (FlavaTextConfig) and inputs.

The FlavaTextModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import AutoTokenizer, FlavaTextModel
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("facebook/flava-full")
>>> model = FlavaTextModel.from_pretrained("facebook/flava-full")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state
FlavaImageModel class transformers.FlavaImageModel < source >

( config: FlavaImageConfig add_pooling_layer: bool = True )

Parameters

The bare FLAVA Image Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( pixel_values: typing.Optional[torch.Tensor] = None bool_masked_pos: typing.Optional[torch.BoolTensor] = None interpolate_pos_encoding: typing.Optional[bool] = None attention_mask: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) transformers.modeling_outputs.BaseModelOutputWithPooling or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.BaseModelOutputWithPooling or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (FlavaImageConfig) and inputs.

The FlavaImageModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import AutoImageProcessor, FlavaImageModel
>>> import torch
>>> from datasets import load_dataset

>>> dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
>>> image = dataset["test"]["image"][0]

>>> image_processor = AutoImageProcessor.from_pretrained("facebook/flava-full")
>>> model = FlavaImageModel.from_pretrained("facebook/flava-full")

>>> inputs = image_processor(image, return_tensors="pt")

>>> with torch.no_grad():
...     outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state
>>> list(last_hidden_states.shape)
[1, 197, 768]
FlavaMultimodalModel class transformers.FlavaMultimodalModel < source >

( config: FlavaMultimodalConfig add_pooling_layer = True )

Parameters

The bare FLAVA Multimodal Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( hidden_states: Tensor attention_mask: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) transformers.modeling_outputs.BaseModelOutputWithPooling or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.BaseModelOutputWithPooling or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (FlavaMultimodalConfig) and inputs.

The FlavaMultimodalModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import AutoTokenizer, FlavaMultimodalModel
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("facebook/flava-full")
>>> model = FlavaMultimodalModel.from_pretrained("facebook/flava-full")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state
< > Update on GitHub

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3