Today's Open Source (2024-10-14): OpenR Full-Chain Framework Boosts Complex Inference and Intelligent Decision-Making!

Explore cutting-edge AI open-source projects like OpenR for advanced reasoning, LightRAG for text generation, and localGPT-Vision for vision-based RAG systems.

Oct 14, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: OpenR

OpenR is an open-source framework designed to leverage large language models for advanced reasoning tasks.

This project offers various training and inference strategies, including reward model training for both generative and discriminative processes, online policy training, and multiple search strategies.

OpenR supports computation and scaling during inference, making it suitable for tasks that require complex reasoning capabilities.

https://github.com/openreasoner/openr

Project: LightRAG

LightRAG is a simple and fast retrieval-augmented generation system aimed at improving text generation quality and efficiency by combining retrieval and generation techniques.

This project provides a lightweight solution for natural language processing tasks requiring efficient information retrieval and generation. Users can implement multiple retrieval modes, including local, global, and hybrid retrieval, through simple API calls.

https://github.com/HKUDS/LightRAG

Project: localGPT-Vision

localGPT-Vision is an end-to-end vision-based retrieval-augmented generation (RAG) system.

Users can upload and index documents (PDFs and images) and ask questions about the content, with the system providing answers along with relevant document fragments.

Retrieval uses the Colqwen or ColPali models, and the retrieved pages are passed to a visual language model (VLM) for generating responses.

The project is built on the Byaldi library and supports multiple VLMs, including Qwen2-VL-7B-Instruct, LLAMA-3.2-11B-Vision, Pixtral-12B-2409, Molmo-7B-O-0924, Google Gemini, and OpenAI GPT-4o.

https://github.com/PromtEngineer/localGPT-Vision

Project: Prompt Engineering Techniques

The Prompt Engineering project is a comprehensive tutorial and implementation collection, covering prompt engineering techniques from basic concepts to advanced strategies.

The project aims to help users master the art of effective communication and utilization of large language models, making it an essential resource in AI applications.

Whether for AI beginners or experienced practitioners, this project offers opportunities to learn, experiment, and innovate.

https://github.com/NirDiamant/Prompt_Engineering

Project: F5-TTS

F5-TTS is a flow-matching-based speech synthesis project designed to generate smooth and faithful speech.

The project combines Diffusion Transformer and ConvNeXt V2 technologies, offering faster training and inference speeds. Through an innovative flow-step sampling strategy during inference, it significantly enhances performance.

The project provides complete code and data processing scripts, supporting training and inference on various datasets.

https://github.com/SWivid/F5-TTS

Project: Agent S

Agent S is a new agent framework designed to make computer usage as intuitive as human interaction.

It introduces an experience-enhanced hierarchical planning method, using online web knowledge for the latest information and narrative memory to leverage high-level experiences from past interactions.

By breaking complex tasks into manageable subtasks and using episodic memory for step-by-step guidance, Agent S continuously optimizes its actions and learns from experience, achieving adaptive and efficient task planning.

https://github.com/simular-ai/agent-s

Thanks for reading AI Disruption! This post is public so feel free to share it.

Today's Open Source (2024-10-12): SJTU Releases libcom, All-in-One Image Composition Toolbox!

Meng Li

October 12, 2024

Today's Open Source (2024-10-12): SJTU Releases libcom, All-in-One Image Composition Toolbox!