A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://huggingface.co/docs/transformers/v4.51.3/en/model_doc/moonshine below:

Website Navigation


Moonshine

Moonshine Overview

The Moonshine model was proposed in Moonshine: Speech Recognition for Live Transcription and Voice Commands by Nat Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, Pete Warden.

The abstract from the paper is the following:

This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture and employs Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. The model is trained on speech segments of various lengths, but without using zero-padding, leading to greater efficiency for the encoder during inference time. When benchmarked against OpenAI’s Whisper tiny-en, Moonshine Tiny demonstrates a 5x reduction in compute requirements for transcribing a 10-second speech segment while incurring no increase in word error rates across standard evaluation datasets. These results highlight Moonshine’s potential for real-time and resource-constrained applications.

Tips:

This model was contributed by Eustache Le Bihan (eustlb). The original code can be found here.

Resources MoonshineConfig class transformers.MoonshineConfig < source >

( vocab_size = 32768 hidden_size = 288 intermediate_size = 1152 encoder_num_hidden_layers = 6 decoder_num_hidden_layers = 6 encoder_num_attention_heads = 8 decoder_num_attention_heads = 8 encoder_num_key_value_heads = None decoder_num_key_value_heads = None pad_head_dim_to_multiple_of = None encoder_hidden_act = 'gelu' decoder_hidden_act = 'silu' max_position_embeddings = 512 initializer_range = 0.02 decoder_start_token_id = 1 use_cache = True rope_theta = 10000.0 rope_scaling = None partial_rotary_factor = 0.9 is_encoder_decoder = True attention_bias = False attention_dropout = 0.0 bos_token_id = 1 eos_token_id = 2 **kwargs )

Parameters

This is the configuration class to store the configuration of a MoonshineModel. It is used to instantiate a Moonshine model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Moonshine UsefulSensors/moonshine-tiny.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

>>> from transformers import MoonshineModel, MoonshineConfig

>>> 
>>> configuration = MoonshineConfig().from_pretrained("UsefulSensors/moonshine-tiny")

>>> 
>>> model = MoonshineModel(configuration)

>>> 
>>> configuration = model.config
MoonshineModel class transformers.MoonshineModel < source >

( config: MoonshineConfig )

Parameters

The bare Moonshine Model outputting raw hidden-states without any specific head on top. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_values: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None past_key_values: typing.Union[transformers.cache_utils.EncoderDecoderCache, typing.Tuple[torch.FloatTensor], NoneType] = None decoder_inputs_embeds: typing.Optional[typing.Tuple[torch.FloatTensor]] = None decoder_position_ids: typing.Optional[typing.Tuple[torch.LongTensor]] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None cache_position: typing.Optional[torch.LongTensor] = None ) transformers.modeling_outputs.Seq2SeqModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.Seq2SeqModelOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MoonshineConfig) and inputs.

The MoonshineModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> import torch
>>> from transformers import AutoFeatureExtractor, MoonshineModel
>>> from datasets import load_dataset

>>> model = MoonshineModel.from_pretrained("UsefulSensors/moonshine-tiny")
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("UsefulSensors/moonshine-tiny")
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> inputs = feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")
>>> input_values = inputs.input_values
>>> decoder_input_ids = torch.tensor([[1, 1]]) * model.config.decoder_start_token_id
>>> last_hidden_state = model(input_values, decoder_input_ids=decoder_input_ids).last_hidden_state
>>> list(last_hidden_state.shape)
[1, 2, 288]
_mask_input_features < source >

( input_features: FloatTensor attention_mask: typing.Optional[torch.LongTensor] = None )

Masks extracted features along time axis and/or along feature axis according to SpecAugment.

MoonshineForConditionalGeneration class transformers.MoonshineForConditionalGeneration < source >

( config: MoonshineConfig )

Parameters

The Moonshine Model with a language modeling head. Can be used for automatic speech recognition. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward < source >

( input_values: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None past_key_values: typing.Union[transformers.cache_utils.EncoderDecoderCache, typing.Tuple[torch.FloatTensor], NoneType] = None decoder_inputs_embeds: typing.Optional[typing.Tuple[torch.FloatTensor]] = None decoder_position_ids: typing.Optional[typing.Tuple[torch.LongTensor]] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None cache_position: typing.Optional[torch.LongTensor] = None labels: typing.Optional[torch.LongTensor] = None ) transformers.modeling_outputs.Seq2SeqLMOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.Seq2SeqLMOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MoonshineConfig) and inputs.

The MoonshineForConditionalGeneration forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

>>> import torch
>>> from transformers import AutoProcessor, MoonshineForConditionalGeneration
>>> from datasets import load_dataset

>>> processor = AutoProcessor.from_pretrained("UsefulSensors/moonshine-tiny")
>>> model = MoonshineForConditionalGeneration.from_pretrained("UsefulSensors/moonshine-tiny")

>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

>>> inputs = processor(ds[0]["audio"]["array"], return_tensors="pt")
>>> input_values = inputs.input_values

>>> generated_ids = model.generate(input_values, max_new_tokens=100)

>>> transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
>>> transcription
'Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.'
generate < source >

( inputs: typing.Optional[torch.Tensor] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None logits_processor: typing.Optional[transformers.generation.logits_process.LogitsProcessorList] = None stopping_criteria: typing.Optional[transformers.generation.stopping_criteria.StoppingCriteriaList] = None prefix_allowed_tokens_fn: typing.Optional[typing.Callable[[int, torch.Tensor], typing.List[int]]] = None synced_gpus: typing.Optional[bool] = None assistant_model: typing.Optional[ForwardRef('PreTrainedModel')] = None streamer: typing.Optional[ForwardRef('BaseStreamer')] = None negative_prompt_ids: typing.Optional[torch.Tensor] = None negative_prompt_attention_mask: typing.Optional[torch.Tensor] = None use_model_defaults: typing.Optional[bool] = None **kwargs ) ModelOutput or torch.LongTensor

Parameters

Returns

ModelOutput or torch.LongTensor

A ModelOutput (if return_dict_in_generate=True or when config.return_dict_in_generate=True) or a torch.LongTensor.

If the model is not an encoder-decoder model (model.config.is_encoder_decoder=False), the possible ModelOutput types are:

If the model is an encoder-decoder model (model.config.is_encoder_decoder=True), the possible ModelOutput types are:

Generates sequences of token ids for models with a language modeling head.

Most generation-controlling parameters are set in generation_config which, if not passed, will be set to the model’s default generation configuration. You can override any generation_config by passing the corresponding parameters to generate(), e.g. .generate(inputs, num_beams=4, do_sample=True).

For an overview of generation strategies and code examples, check out the following guide.

< > Update on GitHub

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3