Today's Open Source (2024-08-16): Nous Research Launches New Hermes 3 Model, Fully Fine-Tuned on Llama 3.1
Explore cutting-edge AI open-source models like Hermes 3, LLaVA-NeXT, Easy-RAG, Meta Expert, MixTeX, and Speech to Speech in this latest roundup.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Hermes 3
Hermes 3 is the latest flagship model in the Hermes series from Nous Research. It’s the first fully fine-tuned model since the release of Llama-3.1, available in 405B/70B/8B versions.
Hermes 3 is a general-purpose language model with many improvements over Hermes 2. These include better agent capabilities, enhanced role-playing, reasoning, multi-turn conversations, and long-context consistency.
https://huggingface.co/NousResearch
Project: LLaVA-NeXT
LLaVA-NeXT is an open-source large multimodal model from teams like ByteDance and Nanyang Technological University. It’s designed for single-image, multi-image, and video tasks.
The project shows strong performance across various benchmarks and provides training code and datasets.
LLaVA-NeXT supports different versions, including LLaVA-OneVision and LLaVA-NeXT-Video, excelling in video and image processing.
https://github.com/LLaVA-VL/LLaVA-NeXT
Project: Easy-RAG
Easy-RAG is a RAG system that's easy to learn, use, and expand. It supports creating and updating knowledge bases in various formats, converting speech to text from audio and video, and offers multi-turn dialogue and knowledge base Q&A.
The project also improves retrieval efficiency through reranking, supports various vector databases, and plans to expand further.
https://github.com/yuntianhe2014/Easy-RAG
Project: Meta Expert
Meta Expert is a multifunctional AI agent project for long-term, research-intensive tasks.
It has two agents: the basic Meta Agent and the more complex Jar3d. Meta Agent demonstrates meta-prompting, while Jar3d combines retrieval-augmented generation (RAG) and chain of thought techniques to handle complex research tasks.
https://github.com/brainqub3/meta_expert
Project: MixTeX
MixTeX is an innovative multimodal LaTeX recognition tool.
It runs efficiently on local CPUs, with no GPU needed, and works on any Windows computer. MixTeX can easily recognize LaTeX formulas, tables, and mixed text, supporting both Chinese and English, greatly improving user experience.
https://github.com/RQLuo/MixTeX-Latex-OCR
Project: Speech To Speech
Speech To Speech is an open-source modular speech-to-speech conversion project. It uses a cascading pipeline for voice activity detection (VAD), speech-to-text (STT), language modeling (LM), and text-to-speech (TTS).
The project utilizes models from the Hugging Face Hub and aims to provide a fully open and modular GPT-4o solution.