Open Source Today (2024-08-12): Qwen2-Audio, 7B Parameters, Voice Chat & Audio Analysis

Explore top AI open-source projects like Qwen2-Audio, ml_mdm, and more, featuring audio analysis, text-to-image models, and intelligent agents. Discover the latest today!

Meng Li

Aug 12, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Qwen2-Audio

AI Disruption

Open-Source Qwen2-Audio: Smoother VoiceChat!

In a universal AI system, the core model should understand information from different modalities…

4 months ago · 1 like · Meng Li

Qwen2-Audio, developed by Alibaba Cloud, is a large-scale audio language model. It can process various audio inputs and perform audio analysis or generate text responses based on voice commands.

This project offers two modes: Voice Chat and Audio Analysis. The Qwen2-Audio series includes two models: Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct.

https://huggingface.co/Qwen/Qwen2-Audio-7B

https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

https://github.com/QwenLM/Qwen2-Audio

https://arxiv.org/abs/2407.10759

Project: ml_mdm

ml_mdm is an open-source Python package by Apple for efficiently training high-quality text-to-image diffusion models. It uses Matryoshka Diffusion Models to generate high-resolution images and videos.

This framework can train a single pixel-space model with resolutions up to 1024x1024 pixels. It demonstrates strong zero-shot generalization on the CC12M dataset.

https://github.com/apple/ml-mdm

https://arxiv.org/abs/2310.15111

Project: fmeval

fmeval is a library for evaluating Large Language Models (LLMs), helping developers choose the best LLM for their use cases. It can assess LLMs on tasks like open-ended generation, text summarization, Q&A, and classification.

fmeval offers various algorithms to evaluate model accuracy, toxicity, semantic robustness, and prompt stereotypes. It supports Amazon SageMaker Endpoints and JumpStart models.

https://github.com/aws/fmeval

Project: Agent Service Toolkit

The AI Agent Service Toolkit is a comprehensive toolkit for building Agent services, based on the LangGraph framework.

It includes a LangGraph agent, a FastAPI service for the agent, a client to interact with the service, and a Streamlit app to provide a chat interface via the client.

This project offers a template for easily creating and running your own agent, showcasing the full setup from agent definition to user interface.

https://github.com/JoshuaC215/agent-service-toolkit

Project: AgentK

AgentK is an agent system with a self-evolution module.

It consists of multiple agents that work together and create new agents as needed to complete user-specified tasks.

At its core, AgentK has a minimal agent and a toolset capable of self-guidance and gradually building its intelligent system.

The project is based on LangGraph and LangChain frameworks and runs in Docker containers for easy isolation and management.

https://github.com/mikekelly/AgentK

Project: Omi

Omi is an open-source AI wearable device project aimed at transforming how users capture and manage conversations.

The device seamlessly connects to mobile devices, providing automatic, high-quality transcriptions for meetings, chats, and voice memos.

https://github.com/BasedHardware/Omi

Open Source Today (2024-08-09): Tongyi Qianwen Releases Qwen2-Math for Advanced Math Reasoning