Today's Open Source (2024-10-16): FunASR Speech Recognition Toolkit
Discover top open-source AI projects like FunASR for speech recognition, LoLCATs for LLMs, VideoGen-Eval for video generation, and more innovative frameworks.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: FunASR
FunASR is a foundational speech recognition toolkit offering multiple features, including Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), punctuation restoration, language modeling, speaker verification, speaker separation, and multi-speaker ASR.
By supporting industrial-grade speech recognition model training and fine-tuning, FunASR helps researchers and developers more easily conduct research and production on speech recognition models, fostering the development of the speech recognition ecosystem.
https://github.com/modelscope/FunASR/
Project: LoLCATs
LoLCATs is a new approach designed to transform existing Transformer models (like Llamas and Mistrals) into advanced-performance low-rank large language models (LLMs).
This method achieves efficiency by employing attention transfer and low-rank linearization, replacing softmax attention with linear attention, and adjusting approximation errors through low-rank adaptation, improving training efficiency while maintaining quality.
https://github.com/hazyresearch/lolcats
Project: VideoGen-Eval
VideoGen-Eval aims to observe and compare the quality of the latest video generation models, particularly the SORA class models.
The project studies high-quality video generation techniques from Text-to-Video (T2V), Image-to-Video (I2V), and Video-to-Video (V2V) generation models.
These models have made significant progress in producing high-resolution videos, natural movements, strong visual-language alignment, and enhanced controllability.
Through showcasing and comparing over 8,000 generated video cases from ten closed-source and several open-source models, the project deeply analyzes the latest advancements in video generation.
https://github.com/AILab-CVC/VideoGen-Eval
Project: Agent-as-a-Judge
The Agent-as-a-Judge project aims to address the shortcomings of traditional evaluation methods in advanced agent systems by introducing an automated evaluation framework.
The project offers a way to conduct evaluations during or after task execution, significantly saving time and costs while providing continuous feedback signals to facilitate further training and improvement of the agent systems.
https://github.com/metauto-ai/agent-as-a-judge
Project: AgentStack
AgentStack is a tool designed for the quick development of robust AI agents. It supports macOS, Windows, and Linux systems, streamlining the process of building agent projects from scratch.
Without the need for configuring complex tools and frameworks, AgentStack offers a simple template that allows developers to focus on writing code.
https://github.com/AgentOps-AI/AgentStack
Project: CleanS2S
CleanS2S is a high-quality, streaming Speech-to-Speech (S2S) interactive agent implemented in a single file.
The project aims to provide a GPT-4o-like Chinese interactive prototype, enabling users to directly experience the powerful functionality of a language user interface and allowing researchers to quickly explore and validate the potential of S2S pipelines.
https://github.com/opendilab/CleanS2S
Project: MMGen
MG2 is a music generation model using a melody-guided innovative approach that, despite being simple and resource-efficient, performs remarkably well.
Users can generate personalized background music for short videos on platforms like TikTok, YouTube Shorts, and Meta Reels using this model.
Moreover, users can fine-tune the model with their private music datasets at a low cost.