Today's Open Source (2024-09-23): Multimodal Large Language Model Oryx

Discover innovative projects like Oryx MLLM, AgentTorch, MAgICoRe, OneGen, and VPTQ that push the boundaries of AI, visual data, and large-scale simulations.

Sep 23, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Oryx

Oryx is a multimodal large language model (MLLM) designed to overcome the limitations of current models when handling visual data of different resolutions and durations.

Oryx introduces two key innovations to dynamically process visual input of any resolution and duration. First, the pre-trained OryxViT model encodes images into visual representations suitable for LLMs. Second, a dynamic compression module allows compressing visual tokens between 1x to 16x as needed.

https://github.com/Oryx-mllm/Oryx

Project: AgentTorch

AgentTorch is an open-source platform designed to tackle challenges in computational efficiency and behavior modeling for large-scale agent simulations.

It optimizes GPU usage, making it possible to efficiently simulate the behaviors of entire cities or countries.

AgentTorch’s design focuses on scalability, differentiability, composability, and generalization.

It helps capture agents’ adaptive behaviors in complex environments, such as pandemics. Researchers can simulate millions of agents and analyze their decisions in various social and economic situations, providing insights for policymaking.

https://github.com/AgentTorch/AgentTorch

Project: MAgICoRe

MAgICoRe is a multi-agent reasoning framework designed to solve complex reasoning tasks through a step-by-step refinement process.

Its key idea is to generate multiple reasoning chains and refine them for difficult cases, improving accuracy and efficiency.

MAgICoRe introduces a difficulty-aware mechanism that splits problems into "easy" and "hard" categories. With three collaborating agents, the model can better identify and correct errors, improving the quality of answers.

https://github.com/dinobby/MAgICoRe

Project: OneGen

OneGen is a framework for one-shot generation and retrieval for large language models (LLMs).

Its main idea is to assign retrieval tasks during the autoregressive generation process, integrating both tasks into the same context. This reduces deployment costs and significantly lowers inference costs.

https://github.com/zjunlp/onegen

Project: VPTQ

VPTQ is a novel post-training quantization method that uses vector quantization to achieve high accuracy at very low bit-widths (less than 2-bit).

It can compress models as large as 70B or even 405B to 1-2 bits without retraining, while maintaining high accuracy. This method reduces memory needs, optimizes storage costs, and lowers memory bandwidth requirements during inference.

https://github.com/microsoft/VPTQ

Open Source Today (2024-09-20): Alibaba Unveils Ovis 1.6 – A New Multimodal Language Model

Meng Li

Sep 20

Open Source Today (2024-09-20): Alibaba Unveils Ovis 1.6 – A New Multimodal Language Model

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Open Source Today (2024-09-20): Alibaba Unveils Ovis 1.6 – A New Multimodal Language Model

Discussion about this post