Today's Open Source (2024-10-16): FunASR Speech Recognition Toolkit

Discover top open-source AI projects like FunASR for speech recognition, LoLCATs for LLMs, VideoGen-Eval for video generation, and more innovative frameworks.

Oct 16, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: FunASR

FunASR is a foundational speech recognition toolkit offering multiple features, including Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), punctuation restoration, language modeling, speaker verification, speaker separation, and multi-speaker ASR.

By supporting industrial-grade speech recognition model training and fine-tuning, FunASR helps researchers and developers more easily conduct research and production on speech recognition models, fostering the development of the speech recognition ecosystem.

https://github.com/modelscope/FunASR/

Project: LoLCATs

LoLCATs is a new approach designed to transform existing Transformer models (like Llamas and Mistrals) into advanced-performance low-rank large language models (LLMs).

This method achieves efficiency by employing attention transfer and low-rank linearization, replacing softmax attention with linear attention, and adjusting approximation errors through low-rank adaptation, improving training efficiency while maintaining quality.

https://github.com/hazyresearch/lolcats

Project: VideoGen-Eval

VideoGen-Eval aims to observe and compare the quality of the latest video generation models, particularly the SORA class models.

The project studies high-quality video generation techniques from Text-to-Video (T2V), Image-to-Video (I2V), and Video-to-Video (V2V) generation models.

These models have made significant progress in producing high-resolution videos, natural movements, strong visual-language alignment, and enhanced controllability.

Through showcasing and comparing over 8,000 generated video cases from ten closed-source and several open-source models, the project deeply analyzes the latest advancements in video generation.

https://github.com/AILab-CVC/VideoGen-Eval

Project: Agent-as-a-Judge

The Agent-as-a-Judge project aims to address the shortcomings of traditional evaluation methods in advanced agent systems by introducing an automated evaluation framework.

The project offers a way to conduct evaluations during or after task execution, significantly saving time and costs while providing continuous feedback signals to facilitate further training and improvement of the agent systems.

https://github.com/metauto-ai/agent-as-a-judge

Project: AgentStack

AgentStack is a tool designed for the quick development of robust AI agents. It supports macOS, Windows, and Linux systems, streamlining the process of building agent projects from scratch.

Without the need for configuring complex tools and frameworks, AgentStack offers a simple template that allows developers to focus on writing code.

https://github.com/AgentOps-AI/AgentStack

Project: CleanS2S

CleanS2S is a high-quality, streaming Speech-to-Speech (S2S) interactive agent implemented in a single file.

The project aims to provide a GPT-4o-like Chinese interactive prototype, enabling users to directly experience the powerful functionality of a language user interface and allowing researchers to quickly explore and validate the potential of S2S pipelines.

https://github.com/opendilab/CleanS2S

Project: MMGen

MG2 is a music generation model using a melody-guided innovative approach that, despite being simple and resource-efficient, performs remarkably well.

Users can generate personalized background music for short videos on platforms like TikTok, YouTube Shorts, and Meta Reels using this model.

Moreover, users can fine-tune the model with their private music datasets at a low cost.

https://github.com/shaopengw/Awesome-Music-Generation

Today's Open Source (2024-10-15): Baichuan Releases Baichuan-Omni 7B Multimodal Large Language Model

Meng Li

October 15, 2024

Today's Open Source (2024-10-15): Baichuan Releases Baichuan-Omni 7B Multimodal Large Language Model

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Today's Open Source (2024-10-15): Baichuan Releases Baichuan-Omni 7B Multimodal Large Language Model

Discussion about this post