Today's Open Source (2024-11-06): Tencent Hunyuan3D-1.0
Explore cutting-edge open-source projects like Tencent's Hunyuan3D-1.0 for text-to-3D generation, veRL for RL training, and more innovative AI tools.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Hunyuan3D-1.0
Hunyuan3D-1.0 is a unified framework that supports text-to-3D and image-to-3D generation.
This project addresses the slow generation speed and poor generalization capability of existing 3D diffusion models through a two-stage approach.
The first stage uses a multi-view diffusion model to quickly generate multi-view RGB images. The second stage uses a feed-forward reconstruction model to rapidly reconstruct 3D assets.
The framework combines the text-to-image model Hunyuan-DiT, enabling 3D generation based on both text and image conditions.
The standard version has three times the parameter size of the lightweight version, striking a balance between speed and quality.
https://github.com/Tencent/Hunyuan3D-1
Project: veRL
veRL is a flexible, efficient, and industrial-grade reinforcement learning (RL) training framework, specifically designed for large language models (LLMs).
It is the open-source version of the HybridFlow paper, supporting various RL algorithms and seamlessly integrating with existing LLM infrastructure.
veRL utilizes a modular API design, supporting multiple device mappings to efficiently leverage resources and scale across different cluster sizes.
https://github.com/volcengine/veRL
Project: PromptFix
PromptFix is an image processing tool based on diffusion models, designed to fix degraded images and remove unwanted elements based on user instructions.
It supports various tasks, including colorization, object removal, dehazing, deblurring, watermark removal, snow removal, and low-light enhancement.
The project employs a 20-step denoising process to correct image defects while preserving the original structure and effectively adapts to different aspect ratios.
https://github.com/yeates/PromptFix
Project: Cofounder
Cofounder is an open-source project for generating full-stack web applications, combining generative UI and a modular design system.
Currently in an early and unstable preview stage, it aims to create backends, databases, and stateful web applications through an AI-guided prototype designer and modular design system.
https://github.com/raidendotai/cofounder
Project: R-CoT
The R-CoT project aims to enhance the performance of large multimodal models in geometric reasoning tasks through reverse-thinking chain problem generation.
The project provides the GeoMM dataset, model weights, training, and evaluation code, supporting various model versions to help researchers explore geometric problem-solving in greater depth.
https://github.com/dle666/r-cot
Project: OuteTTS
OuteTTS is an experimental text-to-speech (TTS) model that uses pure language modeling techniques to generate speech.
This project supports speech generation through Hugging Face models or GGUF model interfaces and offers a voice cloning feature that allows users to create custom speakers from audio files.