( config add_pooling_layer = True use_mask_token = False )
Parameters
bool
, optional, defaults to True
) — Whether or not to apply pooling layer. bool
, optional, defaults to False
) — Whether or not to create and apply mask tokens in the embedding layer. The bare Swin Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward < source >( pixel_values: typing.Optional[torch.FloatTensor] = None bool_masked_pos: typing.Optional[torch.BoolTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None interpolate_pos_encoding: bool = False return_dict: typing.Optional[bool] = None ) → transformers.models.swin.modeling_swin.SwinModelOutput
or tuple(torch.FloatTensor)
Parameters
torch.FloatTensor
of shape (batch_size, num_channels, height, width)
) — Pixel values. Pixel values can be obtained using AutoImageProcessor. See ViTImageProcessor.call() for details. torch.FloatTensor
of shape (num_heads,)
or (num_layers, num_heads)
, optional) — Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]
:
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. See attentions
under returned tensors for more detail. bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail. bool
, optional, defaults to False
) — Whether to interpolate the pre-trained position encodings. bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. torch.BoolTensor
of shape (batch_size, num_patches)
, optional) — Boolean masked positions. Indicates which patches are masked (1) and which aren’t (0). Returns
transformers.models.swin.modeling_swin.SwinModelOutput
or tuple(torch.FloatTensor)
A transformers.models.swin.modeling_swin.SwinModelOutput
or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (SwinConfig) and inputs.
last_hidden_state (torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.
pooler_output (torch.FloatTensor
of shape (batch_size, hidden_size)
, optional, returned when add_pooling_layer=True
is passed) — Average pooling of the last layer hidden-state.
hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each stage) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each stage) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
reshaped_hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each stage) of shape (batch_size, hidden_size, height, width)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs reshaped to include the spatial dimensions.
The SwinModel forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoImageProcessor, SwinModel >>> import torch >>> from datasets import load_dataset >>> dataset = load_dataset("huggingface/cats-image", trust_remote_code=True) >>> image = dataset["test"]["image"][0] >>> image_processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224") >>> model = SwinModel.from_pretrained("microsoft/swin-tiny-patch4-window7-224") >>> inputs = image_processor(image, return_tensors="pt") >>> with torch.no_grad(): ... outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_state >>> list(last_hidden_states.shape) [1, 49, 768]SwinForMaskedImageModeling class transformers.SwinForMaskedImageModeling < source >
( config )
Parameters
Swin Model with a decoder on top for masked image modeling, as proposed in SimMIM.
Note that we provide a script to pre-train this model on custom data in our examples directory.
This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward < source >( pixel_values: typing.Optional[torch.FloatTensor] = None bool_masked_pos: typing.Optional[torch.BoolTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None interpolate_pos_encoding: bool = False return_dict: typing.Optional[bool] = None ) → transformers.models.swin.modeling_swin.SwinMaskedImageModelingOutput
or tuple(torch.FloatTensor)
Parameters
torch.FloatTensor
of shape (batch_size, num_channels, height, width)
) — Pixel values. Pixel values can be obtained using AutoImageProcessor. See ViTImageProcessor.call() for details. torch.FloatTensor
of shape (num_heads,)
or (num_layers, num_heads)
, optional) — Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]
:
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. See attentions
under returned tensors for more detail. bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail. bool
, optional, defaults to False
) — Whether to interpolate the pre-trained position encodings. bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. torch.BoolTensor
of shape (batch_size, num_patches)
) — Boolean masked positions. Indicates which patches are masked (1) and which aren’t (0). Returns
transformers.models.swin.modeling_swin.SwinMaskedImageModelingOutput
or tuple(torch.FloatTensor)
A transformers.models.swin.modeling_swin.SwinMaskedImageModelingOutput
or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (SwinConfig) and inputs.
loss (torch.FloatTensor
of shape (1,)
, optional, returned when bool_masked_pos
is provided) — Masked image modeling (MLM) loss.
reconstruction (torch.FloatTensor
of shape (batch_size, num_channels, height, width)
) — Reconstructed pixel values.
hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each stage) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each stage) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
reshaped_hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each stage) of shape (batch_size, hidden_size, height, width)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs reshaped to include the spatial dimensions.
The SwinForMaskedImageModeling forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Examples:
>>> from transformers import AutoImageProcessor, SwinForMaskedImageModeling >>> import torch >>> from PIL import Image >>> import requests >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> image_processor = AutoImageProcessor.from_pretrained("microsoft/swin-base-simmim-window6-192") >>> model = SwinForMaskedImageModeling.from_pretrained("microsoft/swin-base-simmim-window6-192") >>> num_patches = (model.config.image_size // model.config.patch_size) ** 2 >>> pixel_values = image_processor(images=image, return_tensors="pt").pixel_values >>> >>> bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool() >>> outputs = model(pixel_values, bool_masked_pos=bool_masked_pos) >>> loss, reconstructed_pixel_values = outputs.loss, outputs.reconstruction >>> list(reconstructed_pixel_values.shape) [1, 3, 192, 192]SwinForImageClassification class transformers.SwinForImageClassification < source >
( config )
Parameters
Swin Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet.
Note that it’s possible to fine-tune Swin on higher resolution images than the ones it has been trained on, by setting interpolate_pos_encoding
to True
in the forward of the model. This will interpolate the pre-trained position embeddings to the higher resolution.
This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward < source >( pixel_values: typing.Optional[torch.FloatTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None interpolate_pos_encoding: bool = False return_dict: typing.Optional[bool] = None ) → transformers.models.swin.modeling_swin.SwinImageClassifierOutput
or tuple(torch.FloatTensor)
Parameters
torch.FloatTensor
of shape (batch_size, num_channels, height, width)
) — Pixel values. Pixel values can be obtained using AutoImageProcessor. See ViTImageProcessor.call() for details. torch.FloatTensor
of shape (num_heads,)
or (num_layers, num_heads)
, optional) — Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]
:
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. See attentions
under returned tensors for more detail. bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail. bool
, optional, defaults to False
) — Whether to interpolate the pre-trained position encodings. bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple. torch.LongTensor
of shape (batch_size,)
, optional) — Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]
. If config.num_labels == 1
a regression loss is computed (Mean-Square loss), If config.num_labels > 1
a classification loss is computed (Cross-Entropy). Returns
transformers.models.swin.modeling_swin.SwinImageClassifierOutput
or tuple(torch.FloatTensor)
A transformers.models.swin.modeling_swin.SwinImageClassifierOutput
or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (SwinConfig) and inputs.
loss (torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Classification (or regression if config.num_labels==1) loss.
logits (torch.FloatTensor
of shape (batch_size, config.num_labels)
) — Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each stage) of shape (batch_size, sequence_length, hidden_size)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each stage) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
reshaped_hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each stage) of shape (batch_size, hidden_size, height, width)
.
Hidden-states of the model at the output of each layer plus the initial embedding outputs reshaped to include the spatial dimensions.
The SwinForImageClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example:
>>> from transformers import AutoImageProcessor, SwinForImageClassification >>> import torch >>> from datasets import load_dataset >>> dataset = load_dataset("huggingface/cats-image", trust_remote_code=True) >>> image = dataset["test"]["image"][0] >>> image_processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224") >>> model = SwinForImageClassification.from_pretrained("microsoft/swin-tiny-patch4-window7-224") >>> inputs = image_processor(image, return_tensors="pt") >>> with torch.no_grad(): ... logits = model(**inputs).logits >>> >>> predicted_label = logits.argmax(-1).item() >>> print(model.config.id2label[predicted_label]) tabby, tabby cat
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3