A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://huggingface.co/docs/transformers/v4.51.3/en/model_doc/falcon_mamba below:

Website Navigation


FalconMamba

FalconMamba Overview

The FalconMamba model was proposed by TII UAE (Technology Innovation Institute) in their release.

The abstract from the paper is the following:

We present FalconMamba, a new base large language model based on the novel Mamba architecture. FalconMamba is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, FalconMamba surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3 8B, and Falcon2 11B. It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B. Currently, FalconMamba is the best-performing Mamba model in the literature at this scale, surpassing both existing Mamba and hybrid Mamba-Transformer models. Due to its architecture, FalconMamba is significantly faster at inference and requires substantially less memory for long sequence generation. Despite recent studies suggesting that hybrid Mamba-Transformer models outperform pure architecture designs, we argue and demonstrate that the pure Mamba design can achieve similar, even superior results compared to the hybrid design. We make the weights of our implementation of FalconMamba publicly available under a permissive license.

Tips:

The model has been trained on approximtely 6T tokens consisting a mixture of many data sources such as RefineWeb, Cosmopedia and Math data.

For more details about the training procedure and the architecture, have a look at the technical paper of FalconMamba (coming soon).

Usage

Below we demonstrate how to use the model:

from transformers import FalconMambaForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
model = FalconMambaForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b")

input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

The architecture is also compatible with torch.compile for faster generation:

from transformers import FalconMambaForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
model = FalconMambaForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", torch_dtype=torch.bfloat16).to(0)
model = torch.compile(model)

input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

If you have access to a GPU that is compatible with bitsandbytes, you can also quantize the model in 4-bit precision:

from transformers import FalconMambaForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = FalconMambaForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", quantization_config=quantization_config)

input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

You can also play with the instruction fine-tuned model:

from transformers import FalconMambaForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b-instruct")
model = FalconMambaForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b-instruct")


messages = [
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True).input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
FalconMambaConfig class transformers.FalconMambaConfig < source >

( vocab_size = 50280 hidden_size = 768 state_size = 16 num_hidden_layers = 32 layer_norm_epsilon = 1e-05 pad_token_id = 0 bos_token_id = 0 eos_token_id = 0 expand = 2 conv_kernel = 4 use_bias = False use_conv_bias = True hidden_act = 'silu' initializer_range = 0.1 residual_in_fp32 = True time_step_rank = 'auto' time_step_scale = 1.0 time_step_min = 0.001 time_step_max = 0.1 time_step_init_scheme = 'random' time_step_floor = 0.0001 rescale_prenorm_residual = False use_cache = True use_mambapy = False mixer_rms_eps = 1e-06 **kwargs )

Parameters

This is the configuration class to store the configuration of a FalconMambaModel. It is used to instantiate a FALCON_MAMBA model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the FALCON_MAMBA tiiuae/falcon-mamba-7b architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

>>> from transformers import FalconMambaConfig, FalconMambaModel

>>> 
>>> configuration = FalconMambaConfig()

>>> 
>>> model = FalconMambaModel(configuration)

>>> 
>>> configuration = model.config
FalconMambaModel class transformers.FalconMambaModel < source >

( config )

Parameters

The bare FALCONMAMBA Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_ids: typing.Optional[torch.LongTensor] = None inputs_embeds: typing.Optional[torch.LongTensor] = None cache_params: typing.Optional[transformers.cache_utils.MambaCache] = None use_cache: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None cache_position: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None ) transformers.models.falcon_mamba.modeling_falcon_mamba.FalconMambaOutput or tuple(torch.FloatTensor)

Parameters

Returns

transformers.models.falcon_mamba.modeling_falcon_mamba.FalconMambaOutput or tuple(torch.FloatTensor)

A transformers.models.falcon_mamba.modeling_falcon_mamba.FalconMambaOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (FalconMambaConfig) and inputs.

The FalconMambaModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> from transformers import AutoTokenizer, FalconMambaModel
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
>>> model = FalconMambaModel.from_pretrained("tiiuae/falcon-mamba-7b")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state
FalconMambaLMHeadModel class transformers.FalconMambaForCausalLM < source >

( config )

Parameters

The FALCONMAMBA Model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None cache_params: typing.Optional[transformers.cache_utils.MambaCache] = None labels: typing.Optional[torch.LongTensor] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None use_cache: typing.Optional[bool] = None cache_position: typing.Optional[torch.Tensor] = None **kwargs ) transformers.models.falcon_mamba.modeling_falcon_mamba.FalconMambaCausalLMOutput or tuple(torch.FloatTensor)

Parameters

Returns

transformers.models.falcon_mamba.modeling_falcon_mamba.FalconMambaCausalLMOutput or tuple(torch.FloatTensor)

A transformers.models.falcon_mamba.modeling_falcon_mamba.FalconMambaCausalLMOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (FalconMambaConfig) and inputs.

The FalconMambaForCausalLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> import torch
>>> from transformers import AutoTokenizer, FalconMambaForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
>>> model = FalconMambaForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs, labels=inputs["input_ids"])
>>> loss = outputs.loss
>>> logits = outputs.logits
< > Update on GitHub

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3