Open Source Today (2024-08-28): CogVideoX-5b on RTX 3060; Apple's Training-Free Multimodal Model
Discover top AI open-source projects like Zhipu's CogVideoX-5b and Apple's SlowFast-LLaVA, perfect for video generation and multimodal understanding.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: CogVideoX/CogVideoX-5b
Zhipu's CogVideoX series introduces the new open-source CogVideoX-5b, offering higher video generation quality and better visual effects.
CogVideoX-5B requires only 18GB of VRAM for inference at FP-16 precision and 40GB for fine-tuning. This means the model can run on an RTX 3060 GPU.
CogVideoX is Zhipu's open-source video generation model, sharing the same origin as the previously released "Qingying." The series includes models of various sizes, with the earlier open-source version being CogVideoX-2B.
https://github.com/THUDM/CogVideo
https://huggingface.co/THUDM/CogVideoX-5b
Project: SlowFast-LLaVA
SlowFast-LLaVA is a multimodal large language model from Apple that requires no training, focusing on video understanding and inference.
It captures detailed spatial semantics and long-range temporal context without exceeding the token budget of typical LLMs.
The model doesn’t need fine-tuning and performs as well as, or even better than, the most advanced video LLMs in video QA tasks and benchmarks.
https://github.com/apple/ml-slowfast-llava
https://arxiv.org/abs/2407.15841
Project: Kotaemon
Kotaemon is an open-source RAG development tool. It provides a simple UI for end-users to perform RAG-based Q&A and supports multiple LLM API providers (e.g., OpenAI, Cohere) and local LLMs.
For developers, it offers a framework to build custom RAG document Q&A pipelines, with a UI built through Gradio for customization and visualization.
https://github.com/Cinnamon/kotaemon
Project: RAGLAB
RAGLAB is a modular, research-focused open-source framework dedicated to RAG algorithms.
It includes reproductions of 6 existing RAG algorithms and a comprehensive evaluation system with 10 benchmark datasets, facilitating fair comparisons and efficient development of new algorithms, datasets, and metrics.
https://github.com/fate-ubw/RAGLAB
https://arxiv.org/abs/2408.11381
Project: RAGChecker
RAGChecker is an advanced automated evaluation framework for assessing and diagnosing Retrieval-Augmented Generation (RAG) systems.
It offers a comprehensive set of metrics and tools for in-depth performance analysis, helping developers and researchers accurately evaluate, diagnose, and improve their RAG systems.
https://github.com/amazon-science/ragchecker
https://arxiv.org/abs/2408.08067
Project: Llama Stack
Llama Stack defines and standardizes the building blocks needed to bring generative AI applications to market.
These modules cover the entire development lifecycle: from model training and fine-tuning to product evaluation and deploying AI agents in production.