Open Source Today (2024-08-12): Qwen2-Audio, 7B Parameters, Voice Chat & Audio Analysis
Explore top AI open-source projects like Qwen2-Audio, ml_mdm, and more, featuring audio analysis, text-to-image models, and intelligent agents. Discover the latest today!
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Qwen2-Audio
Qwen2-Audio, developed by Alibaba Cloud, is a large-scale audio language model. It can process various audio inputs and perform audio analysis or generate text responses based on voice commands.
This project offers two modes: Voice Chat and Audio Analysis. The Qwen2-Audio series includes two models: Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct.
https://huggingface.co/Qwen/Qwen2-Audio-7B
https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct
https://github.com/QwenLM/Qwen2-Audio
https://arxiv.org/abs/2407.10759
Project: ml_mdm
ml_mdm is an open-source Python package by Apple for efficiently training high-quality text-to-image diffusion models. It uses Matryoshka Diffusion Models to generate high-resolution images and videos.
This framework can train a single pixel-space model with resolutions up to 1024x1024 pixels. It demonstrates strong zero-shot generalization on the CC12M dataset.
https://github.com/apple/ml-mdm
https://arxiv.org/abs/2310.15111
Project: fmeval
fmeval is a library for evaluating Large Language Models (LLMs), helping developers choose the best LLM for their use cases. It can assess LLMs on tasks like open-ended generation, text summarization, Q&A, and classification.
fmeval offers various algorithms to evaluate model accuracy, toxicity, semantic robustness, and prompt stereotypes. It supports Amazon SageMaker Endpoints and JumpStart models.
Project: Agent Service Toolkit
The AI Agent Service Toolkit is a comprehensive toolkit for building Agent services, based on the LangGraph framework.
It includes a LangGraph agent, a FastAPI service for the agent, a client to interact with the service, and a Streamlit app to provide a chat interface via the client.
This project offers a template for easily creating and running your own agent, showcasing the full setup from agent definition to user interface.
https://github.com/JoshuaC215/agent-service-toolkit
Project: AgentK
AgentK is an agent system with a self-evolution module.
It consists of multiple agents that work together and create new agents as needed to complete user-specified tasks.
At its core, AgentK has a minimal agent and a toolset capable of self-guidance and gradually building its intelligent system.
The project is based on LangGraph and LangChain frameworks and runs in Docker containers for easy isolation and management.
https://github.com/mikekelly/AgentK
Project: Omi
Omi is an open-source AI wearable device project aimed at transforming how users capture and manage conversations.
The device seamlessly connects to mobile devices, providing automatic, high-quality transcriptions for meetings, chats, and voice memos.