Today's Open Source (2024-10-29): Meta Open-Sources LongVU Large Model

LongVU enhances long video comprehension with spatiotemporal compression. CoI-Agent revolutionizes research via LLMs, and x.infer simplifies CV inference for 1,000+ models.

Meng Li

Oct 29, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: LongVU

LongVU: Meta AI Open-Sources a Multimodal Model for Long Video Language Understanding!

Meng Li

October 28, 2024

Read full story

The LongVU project aims to enhance long video language comprehension through spatiotemporal adaptive compression technology.

This project integrates advanced visual encoders and language models, effectively processing and understanding complex information in long videos.

LongVU provides multiple resource versions, supporting both local deployment and online demos, making it suitable for a wide range of applications requiring video and language data processing.

Project: CoI-Agent

Chain of Ideas (CoI) Agent is a project designed to revolutionize research and idea development through large language model (LLM) agents.

The project offers a systematic approach to generating and developing research ideas by utilizing advanced natural language processing and machine learning models. It helps researchers explore and innovate more efficiently in scientific studies.

https://github.com/DAMO-NLP-SG/CoI-Agent

Project: AgenticIR

This project addresses complex image restoration problems, utilizing an intelligent agent system for tasks like deblurring, defogging, and image enhancement.

By leveraging learning and experience, the system effectively restores real-world image quality.

https://github.com/Kaiwen-Zhu/AgenticIR

Project: x.infer

x.infer is a framework-agnostic computer vision inference library that enables inference for any computer vision model through simple Python code.

It supports various frameworks and over 1,000 models, providing a unified interface and modular design, allowing users to easily integrate and replace models.

x.infer also supports interactive interfaces through Gradio.

https://github.com/dnth/x.infer

Project: PyramidDrop

PyramidDrop is a project aimed at accelerating large vision-language models by reducing visual redundancy.

The core idea is to utilize image tagging at different levels of redundancy, reducing deep-layer redundancy to improve the efficiency of training and inference.

PyramidDrop accelerates models during training and can also be used as a plug-and-play strategy for inference acceleration, offering both high performance and low inference costs.

https://github.com/Cooperx521/PyramidDrop

Project: LLaVA-MoD

LLaVA-MoD is an efficient framework designed to train small-scale multimodal language models by distilling knowledge from large-scale multimodal language models.

The project optimizes networks by integrating a mixture of sparse experts (MoE) architecture and adopts a two-stage knowledge transfer strategy: imitation distillation and preference distillation.

Experiments show that LLaVA-MoD outperforms existing models on multimodal benchmarks with the least number of activated parameters and low computational costs.

https://github.com/shufangxun/LLaVA-MoD

Today's Open Source (2024-10-28): Open Source Large Model Service Framework KAG

Meng Li

October 28, 2024

Today's Open Source (2024-10-28): Open Source Large Model Service Framework KAG

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

LongVU: Meta AI Open-Sources a Multimodal Model for Long Video Language Understanding!

Today's Open Source (2024-10-28): Open Source Large Model Service Framework KAG

Discussion about this post