A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://audioldm.github.io/audiosr/ below:

Versatile Audio Super-resolution at Scale

Haohe Liu1, Ke Chen2, Qiao Tian3, Wenwu Wang1, Mark D. Plumbley1

1CVSSP, University of Surrey

2University of California San Diego

3Speech, Audio & Music Intelligence (SAMI), Bytedance


Abstract

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4 kHz to 8 kHz). We introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2 kHz to 16 kHz to a high-resolution audio signal at 24 kHz bandwidth with a sampling rate of 48 kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen.

Figure 1: The AudioSR architecture. The replacement-based post-processing aims to preserve the original lower-frequency information in the model output.

Table 1: Subjective evaluation shows that applyin AudioSR for audio super-resolution on the output of audio generation models can significantly enhance the perceptual quality.

Audio Super-Resolution across Different Models

Speech:

8 kHz Sample NVSR-DNN NVSR-ResUNet AudioSR Ground-Truth

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Sound Effect:

8 kHz Sample NVSR-DNN NVSR-ResUNet AudioSR Ground-Truth

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Music:

8 kHz Sample AudioSR Ground-Truth

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

AudioSR on Improving Generative Models

FastSpeech 2 (Text-to-Speech, 22.05 kHz to 48 kHz):

The "Generation" column is the model output (e.g., from MusicGen) after the pre-processing step introduced in the paper.

Generation AudioSR Improvement

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

AudioLDM (Text-to-Audio, 16 kHz to 48 kHz):

Generation AudioSR Improvement

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

MusicGen (Text-to-Music, 32 kHz to 48 kHz):

Generation AudioSR Improvement

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

Your browser does not support the audio element.

More demos are on their way. Stay tuned for more updates.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4