Today's Open Source (2024-10-28): Open Source Large Model Service Framework KAG

Explore innovative projects like KAG for knowledge-enhanced decision-making, NotebookLlama for PDF to podcast workflows, and more cutting-edge solutions in AI and open-source development.

Oct 28, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: KAG

KAG is a knowledge-enhanced generation framework based on the OpenSPG engine, designed to build knowledge-enhanced rigorous decision-making and information retrieval services.

Through bidirectional enhancement with knowledge graphs and vector retrieval, KAG addresses the challenges of RAG technology regarding knowledge reasoning relevance and logical sensitivity.

KAG performs excellently in multi-hop question-answering tasks and has been successfully applied to Ant Group's professional knowledge Q&A tasks.

https://github.com/OpenSPG/KAG

Project: NotebookLlama

Meta Launches Open-Source Version: NotebookLlama, Rivals Google’s Popular NotebookLM

Meng Li

Oct 28

Read full story

NotebookLlama is an open-source project aimed at guiding users to build workflows from PDF to podcast through a series of tutorials and notebooks.

The project employs various large language models (LLMs) and text-to-speech (TTS) models to assist users in extracting text from PDF documents, generating podcast scripts, and converting them into audio podcasts.

The project assumes users have no foundational knowledge of LLMs, prompts, or audio models and provides detailed guidance in each notebook.

Project: OmniParser

Microsoft Open-Sources OmniParser: Intelligent Agent for Controlling PCs and Mobile Devices

Meng Li

Oct 26

Read full story

OmniParser is a comprehensive approach for parsing user interface screenshots into structured and easily understandable elements.

This significantly enhances GPT-4V's ability to generate accurate actions in corresponding areas of the interface.

The project provides icon detection and function description models and has achieved top performance on the Windows Agent Arena.

Project: Vulnhuntr

Vulnhuntr is a tool that leverages large language models (LLMs) and static code analysis to automatically discover remotely exploitable vulnerabilities.

It is capable of creating and analyzing complete code call chains from remote user inputs to server outputs, detecting complex multi-step security vulnerabilities that exceed the capabilities of traditional static code analysis tools.

https://github.com/protectai/vulnhuntr

Project: MaskGCT

MaskGCT is a fully non-autoregressive text-to-speech (TTS) model that eliminates the need for explicit alignment information between text and speech supervision, as well as phoneme-level duration prediction.

The model operates in two phases: the first phase uses text to predict semantic labels extracted from a speech self-supervised learning model, and the second phase predicts acoustic labels conditioned on these semantic labels.

MaskGCT follows a mask-and-predict learning paradigm, and experiments show it outperforms current zero-shot TTS systems in quality, similarity, and intelligibility.

https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct

Project: Bee Agent Framework

Bee Agent Framework is an open-source framework designed to build, deploy, and service large-scale intelligent agent workflows.

The framework supports integration with various models, particularly IBM Granite and Llama 3. x models, and is optimizing performance with other popular large language models.

Its goal is to help developers adopt the latest open-source and proprietary models with minimal modifications.

https://github.com/i-am-bee/bee-agent-framework

Open Source Today (2024-10-25): Zhipu Open-Sources GLM-4-Voice

Meng Li

Oct 25

Open Source Today (2024-10-25): Zhipu Open-Sources GLM-4-Voice

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Meta Launches Open-Source Version: NotebookLlama, Rivals Google’s Popular NotebookLM

Microsoft Open-Sources OmniParser: Intelligent Agent for Controlling PCs and Mobile Devices

Open Source Today (2024-10-25): Zhipu Open-Sources GLM-4-Voice

Discussion about this post