Today's Open Source (2024-10-28): Open Source Large Model Service Framework KAG
Explore innovative projects like KAG for knowledge-enhanced decision-making, NotebookLlama for PDF to podcast workflows, and more cutting-edge solutions in AI and open-source development.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: KAG
KAG is a knowledge-enhanced generation framework based on the OpenSPG engine, designed to build knowledge-enhanced rigorous decision-making and information retrieval services.
Through bidirectional enhancement with knowledge graphs and vector retrieval, KAG addresses the challenges of RAG technology regarding knowledge reasoning relevance and logical sensitivity.
KAG performs excellently in multi-hop question-answering tasks and has been successfully applied to Ant Group's professional knowledge Q&A tasks.
https://github.com/OpenSPG/KAG
Project: NotebookLlama
NotebookLlama is an open-source project aimed at guiding users to build workflows from PDF to podcast through a series of tutorials and notebooks.
The project employs various large language models (LLMs) and text-to-speech (TTS) models to assist users in extracting text from PDF documents, generating podcast scripts, and converting them into audio podcasts.
The project assumes users have no foundational knowledge of LLMs, prompts, or audio models and provides detailed guidance in each notebook.
Project: OmniParser
OmniParser is a comprehensive approach for parsing user interface screenshots into structured and easily understandable elements.
This significantly enhances GPT-4V's ability to generate accurate actions in corresponding areas of the interface.
The project provides icon detection and function description models and has achieved top performance on the Windows Agent Arena.
Project: Vulnhuntr
Vulnhuntr is a tool that leverages large language models (LLMs) and static code analysis to automatically discover remotely exploitable vulnerabilities.
It is capable of creating and analyzing complete code call chains from remote user inputs to server outputs, detecting complex multi-step security vulnerabilities that exceed the capabilities of traditional static code analysis tools.
https://github.com/protectai/vulnhuntr
Project: MaskGCT
MaskGCT is a fully non-autoregressive text-to-speech (TTS) model that eliminates the need for explicit alignment information between text and speech supervision, as well as phoneme-level duration prediction.
The model operates in two phases: the first phase uses text to predict semantic labels extracted from a speech self-supervised learning model, and the second phase predicts acoustic labels conditioned on these semantic labels.
MaskGCT follows a mask-and-predict learning paradigm, and experiments show it outperforms current zero-shot TTS systems in quality, similarity, and intelligibility.
https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct
Project: Bee Agent Framework
Bee Agent Framework is an open-source framework designed to build, deploy, and service large-scale intelligent agent workflows.
The framework supports integration with various models, particularly IBM Granite and Llama 3. x models, and is optimizing performance with other popular large language models.
Its goal is to help developers adopt the latest open-source and proprietary models with minimal modifications.