Today's Open Source (2024-08-16): Nous Research Launches New Hermes 3 Model, Fully Fine-Tuned on Llama 3.1

Explore cutting-edge AI open-source models like Hermes 3, LLaVA-NeXT, Easy-RAG, Meta Expert, MixTeX, and Speech to Speech in this latest roundup.

Aug 16, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Hermes 3

Hermes 3 is the latest flagship model in the Hermes series from Nous Research. It’s the first fully fine-tuned model since the release of Llama-3.1, available in 405B/70B/8B versions.

Hermes 3 is a general-purpose language model with many improvements over Hermes 2. These include better agent capabilities, enhanced role-playing, reasoning, multi-turn conversations, and long-context consistency.

https://huggingface.co/NousResearch

Project: LLaVA-NeXT

LLaVA-NeXT is an open-source large multimodal model from teams like ByteDance and Nanyang Technological University. It’s designed for single-image, multi-image, and video tasks.

The project shows strong performance across various benchmarks and provides training code and datasets.

LLaVA-NeXT supports different versions, including LLaVA-OneVision and LLaVA-NeXT-Video, excelling in video and image processing.

https://github.com/LLaVA-VL/LLaVA-NeXT

Project: Easy-RAG

Easy-RAG is a RAG system that's easy to learn, use, and expand. It supports creating and updating knowledge bases in various formats, converting speech to text from audio and video, and offers multi-turn dialogue and knowledge base Q&A.

The project also improves retrieval efficiency through reranking, supports various vector databases, and plans to expand further.

https://github.com/yuntianhe2014/Easy-RAG

Project: Meta Expert

Meta Expert is a multifunctional AI agent project for long-term, research-intensive tasks.

It has two agents: the basic Meta Agent and the more complex Jar3d. Meta Agent demonstrates meta-prompting, while Jar3d combines retrieval-augmented generation (RAG) and chain of thought techniques to handle complex research tasks.

https://github.com/brainqub3/meta_expert

Project: MixTeX

MixTeX is an innovative multimodal LaTeX recognition tool.

It runs efficiently on local CPUs, with no GPU needed, and works on any Windows computer. MixTeX can easily recognize LaTeX formulas, tables, and mixed text, supporting both Chinese and English, greatly improving user experience.

https://github.com/RQLuo/MixTeX-Latex-OCR

Project: Speech To Speech

Speech To Speech is an open-source modular speech-to-speech conversion project. It uses a cascading pipeline for voice activity detection (VAD), speech-to-text (STT), language modeling (LM), and text-to-speech (TTS).

The project utilizes models from the Hugging Face Hub and aims to provide a fully open and modular GPT-4o solution.

https://github.com/eustlb/speech-to-speech

Today's Open Source (2024-08-15): THUDM Tsinghua Releases LongWriter, Capable of Writing 10,000-Word Texts

Meng Li

Aug 15

Today's Open Source (2024-08-15): THUDM Tsinghua Releases LongWriter, Capable of Writing 10,000-Word Texts

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Today's Open Source (2024-08-15): THUDM Tsinghua Releases LongWriter, Capable of Writing 10,000-Word Texts

Discussion about this post