Today's Open Source (2024-10-24): The Largest Open Source Video Generation Model, Mochi 1

Discover exciting open-source AI models like Mochi 1, Video-XL, and RagVL. Explore advanced video generation, long video understanding, and more!

Oct 24, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Mochi 1

Mochi 1 is an open-source advanced video generation model, featuring high-fidelity motion and strong adherence to prompts.

The model has made significant progress in open video generation systems and is released under the Apache 2.0 license.

Utilizing a novel Asymmetric Diffusion Transformer (AsymmDiT) architecture, it is the largest video generation model to date, enabling users to generate videos via the Gradio UI or command line interface.

https://github.com/genmoai/models

Project: Video-XL

Video-XL is a visual language model specifically designed for hour-long video understanding.

This project has excelled in multiple benchmark tests, capable of processing the visual context of long videos, making it suitable for practical scenarios such as movie summarization, anomaly detection in surveillance, and advertisement recognition.

https://github.com/VectorSpaceLab/Video-XL

Project: agent.exe

agent.exe is a simple Electron application that allows Claude 3.5 Sonnet to directly control the local computer. Users can enable AI to perform tasks on their computer by providing an API Key.

The project supports MacOS and theoretically supports Windows and Linux.

The aim is to showcase Claude's capabilities in computer usage.

https://github.com/corbt/agent.exe

Project: Agent2sim

The Agent-to-Sim project aims to learn interactive behaviors from everyday videos. By employing 4D reconstruction technology, it can extract and simulate complex interactive behaviors from videos.

The project provides a set of tools and methods to assist researchers and developers in behavior modeling and simulation without the need for extensive annotated data.

https://github.com/facebookresearch/agent2sim

Project: RagVL

RagVL is a multimodal retrieval-augmented generation project that enhances multimodal generation through knowledge-augmented re-ranking and noise-injection training.

The project offers an official PyTorch implementation, aiming to improve performance in generation tasks through enhanced re-ranking methods.

https://github.com/IDEA-FinAI/RagVL

Project: self-llm

The "Open Source Large Model Usage Guide" is a tutorial project designed for beginners in China, providing comprehensive guidance on using open-source large models on the Linux platform.

The project covers skills such as environment configuration, local deployment, and efficient fine-tuning, helping ordinary students and researchers better utilize open-source large models. By simplifying deployment and application processes, the project aims to make it easier for more people to engage with large model usage.

https://github.com/datawhalechina/self-llm

Today's Open Source (2024-10-23): Stable Diffusion 3.5 Released

Meng Li

Oct 23

Today's Open Source (2024-10-23): Stable Diffusion 3.5 Released

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Today's Open Source (2024-10-23): Stable Diffusion 3.5 Released

Discussion about this post