Today's Open Source (2024-09-19): Alibaba Cloud Launches Qwen2.5 with Multilingual and Long-Text Support

Discover Alibaba Cloud's Qwen2.5, a multilingual large language model with 18T tokens, and explore Moshi's low-latency speech-to-text framework.

Sep 19, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Alibaba Cloud Qwen2.5

Qwen2.5 is a large language model series developed by Alibaba Cloud's Qwen team.

The training data expanded from Qwen2's 7T tokens to Qwen2.5's 18T tokens. It supports 29 languages and comes in various sizes, ranging from 0.5B to 72B parameters. It handles up to 128K tokens of context, with over 32K tokens processed using YARN for extrapolation. The maximum output length is 8K tokens.

Qwen2.5 shows significant improvements in following instructions, generating structured outputs, and supporting multiple languages. It also includes open-source models like Qwen2.5-Coder and Qwen2.5-Math.

https://github.com/QwenLM/Qwen2.5

Project: Moshi

Schema representing the structure of Moshi. Moshi models two streams of audio:
one corresponds to Moshi, and the other one to the user. At inference, the audio stream of the user is taken from the audio input, and the audio stream for Moshi is sampled from the model's output. Along that, Moshi predicts text tokens corresponding to its own speech for improved accuracy. A small Depth Transformer models inter codebook dependencies for a given step.

Moshi is a speech-to-text foundational model and a full-duplex voice dialogue framework.

It uses the advanced Mimi streaming neural audio codec, processing 24 kHz audio at 1.1 kbps with just 80ms latency.

Moshi operates with two audio streams—one for itself and one for the user. It predicts text tokens for its own speech, improving generation quality.

The project achieves real-world latency as low as 200ms on L4 GPUs.

https://github.com/kyutai-labs/moshi

Project: Windows Agent Arena

Windows Agent Arena (WAA) is a scalable platform for testing and benchmarking multimodal AI agents on the Windows OS.

It provides researchers and developers with a reproducible and realistic environment to test agent workflows across various tasks.

WAA supports large-scale agent deployment on Azure ML infrastructure, allowing parallel runs of multiple agents and fast benchmarking of hundreds of tasks within minutes.

https://github.com/microsoft/windowsagentarena

Project: ReflectionAnyLLM

A preview of it used with Gemma 2 9b LLM

ReflectionAnyLLM is a lightweight proof-of-concept project. It demonstrates basic chain-of-thought reasoning with any large language model (LLM) that supports OpenAI-compatible APIs.

It works with both local and remote LLMs, allowing users to switch between different providers with minimal setup.

https://github.com/antibitcoin/ReflectionAnyLLM

Project: LLM-Engines

LLM-Engines is a unified inference engine for large language models (LLMs). It supports open-source models like VLLM, SGLang, and Together, as well as commercial ones like OpenAI, Mistral, and Claude.

The project validates inference correctness by comparing outputs from different engines under the same parameters. Users can easily call different models via a simple API.

https://github.com/jdf-prog/LLM-Engines

Project: Subspace-Tuning

Subspace-Tuning is a general framework designed to enable parameter-efficient fine-tuning using subspace methods.

Its goal is to adapt large pre-trained models for specific tasks by making minimal changes to the original parameters.

It does this by finding the optimal projection of weights in subspace. The project offers resources to help researchers and practitioners integrate it into their own work.

https://github.com/Chongjie-Si/Subspace-Tuning

Thanks for reading AI Disruption! This post is public so feel free to share it.

Open Source Today (2024-09-14): Tencent Unveils GameGen-O, First Open-World Game Model

Meng Li

September 14, 2024

Open Source Today (2024-09-14): Tencent Unveils GameGen-O, First Open-World Game Model