Today's Open Source (2024-10-11): Peking University & Kuaishou's Pyramid Flow Matching for Quick 10-Second Videos.
Discover exciting AI open-source projects: from Pyramidal Flow Matching for quick video generation to Aria's multimodal capabilities.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Pyramidal Flow Matching
Peking University, in collaboration with Kuaishou, has open-sourced Pyramid Flow Matching, an efficient autoregressive video generation method based on flow matching.
This project is trained solely on open-source datasets and can generate high-quality 10-second videos at a resolution of 768p and a frame rate of 24 FPS, naturally supporting generation from images to videos.
https://github.com/jy0205/Pyramid-Flow
Project: Aria
Aria is a multimodal local mixture of expert models that excels in various multimodal, language, and coding tasks, particularly in video and document understanding.
It supports multimodal inputs of up to 64K tokens and can generate subtitles for 256 frames of video within 10 seconds. Aria is designed to be lightweight and fast, efficiently encoding visual inputs of varying sizes and aspect ratios.
https://huggingface.co/rhymes-ai/Aria
Project: AWT
AWT is an innovative framework aimed at transferring pre-trained visual language models (VLMs) to downstream tasks.
This framework enhances the zero-shot capabilities of VLMs through augmentation, weighting, and transfer techniques, and performs excellently in few-shot learning with multimodal adapters.
AWT has set new benchmark records in zero-shot and few-shot image and video tasks, achieving state-of-the-art performance.
https://github.com/MCG-NJU/AWT
Project: Swiftide
Swiftide is a local library written in Rust, specifically designed for building large language model (LLM) applications.
It implements retrieval-augmented generation (RAG) by quickly ingesting, transforming, and indexing large amounts of data, then querying this data to inject it into prompts.
Swiftide aims to provide a fast, user-friendly, reliable, and easily extensible RAG library, enabling developers to quickly build AI applications from idea to production.
https://github.com/bosun-ai/swiftide
Project: PMRF
PMRF is a novel photo-realistic image restoration algorithm.
This algorithm approximately achieves the optimal estimator that minimizes mean squared error (MSE) under perfect perceptual quality constraints.
The project offers quantitative and visual comparison results across different test sets.
https://github.com/ohayonguy/PMRF
Project: FineVideo
FineVideo is a dataset containing over 43,000 videos and 3,400 hours of video data, accompanied by detailed descriptions, narrative details, scene segmentation, and question-answer pairs.
This project provides a comprehensive codebase for video collection and annotation, supporting large-scale data processing and distributed computing.