Today's Open Source (2024-11-13): LLaVA-KD Knowledge Distillation Framework
Explore innovative open-source projects including LLaVA-KD for knowledge distillation, SVDQuant for efficient quantization, WhoDB for natural language database management, and more.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: LLaVA-KD
LLaVA-KD is a knowledge distillation framework for multimodal large language models (MLLMs), designed to transfer the capabilities of large models to smaller ones, reducing computational requirements.
This project significantly enhances the performance of smaller models through multimodal distillation and relation distillation techniques, combined with a three-stage training scheme, without altering the model architecture.
https://github.com/Fantasyele/LLaVA-KD
Project: SVDQuant
SVDQuant is a post-training quantization technique focused on the quantization of 4-bit weights and activations, while maintaining visual fidelity.
This technique achieves a 3.6x reduction in memory on the 12B FLUX.1-dev model and provides an 8.7x speedup on a 16GB RTX 4090 GPU.
SVDQuant improves the visual quality of the PixArt-∑ model by absorbing outliers through low-rank decomposition.
https://github.com/mit-han-lab/nunchaku
Project: WhoDB
WhoDB is a lightweight and powerful database management tool designed to simplify database management tasks.
It combines the simplicity of Adminer with enhanced user experience and performance, and it is built with GoLang to provide optimal speed and efficiency.
WhoDB supports natural language interaction with data, integrating Ollama, ChatGPT, and Anthropic, allowing users to perform queries and manage data through conversation rather than complex SQL.
https://github.com/clidey/whodb
Project: Lumen
Lumen is a command-line tool that uses artificial intelligence to generate Git commit messages, summarizing Git changes without the need for API keys.
It supports multiple AI providers, capable of generating commit messages for staged changes, summarizing specific Git commits or diffs, and allowing users to ask questions about specific changes.
Lumen offers multiple output formats and supports various operating systems.
https://github.com/jnsahaj/lumen
Project: FreeVideoLLM
FreeVideoLLM is an efficient video language model that requires no training and aims to understand video content through prompt-guided visual perception.
This project leverages pre-trained LLaVA model weights, combined with multiple video question-answering datasets, to offer a method for video inference and evaluation without the need for additional training.
The project supports processing and inference on a variety of video question-answering datasets, suitable for multimodal video understanding tasks.
https://github.com/contrastive/FreeVideoLLM
Project: BEN
BEN is a deep learning model designed to automatically remove backgrounds from images, generating masks and foreground images.
The model supports CUDA acceleration and provides a simple API for easy integration.