Open-Source Qwen2-Audio: Smoother VoiceChat!

Aug 10, 2024

∙ Paid

In a universal AI system, the core model should understand information from different modalities.

Current large language models can now comprehend language and reason, and they have expanded to include more modalities, such as vision and audio.

The Universal Qwen team has previously released several Qwen language model series and multimodal models like Qwen-VL and Qwen-Audio.

Today, the Universal Qwen team officially announces Qwen2-Audio.

This is the next generation of Qwen-Audio. It can accept audio and text inputs and generate text outputs. Qwen2-Audio has the following features:

Voice chat: Users can give commands to the audio language model using voice, without the need for an Automatic Speech Recognition (ASR) module.
Audio analysis: The model can analyze audio information based on text instructions, including speech, sounds, and music.
Multilingual support: The model supports over 8 languages and dialects, such as Mandarin, English, Cantonese, French, Italian, Spanish, German, and Japanese.

AI Disruption