Open Source Today (2024-09-14): Tencent Unveils GameGen-O, First Open-World Game Model
Tencent GameGen-O is the first model for generating open-world video games using diffusion transformers. Supports interactive controls via text, actions, and video prompts.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Tencent GameGen-O
Tencent released GameGen-O, the first diffusion transformer model designed for generating open-world video games. It uses the proprietary OGameData dataset and GPT-4o for data annotation. The model follows the Latte and OpenSora V1.2 framework principles.
GameGen-O simulates a wide range of game engine features, including innovative characters, dynamic environments, complex actions, and diverse events, enabling high-quality open-world generation.
It supports interactive controls where users can guide the game content through text, actions, and video prompts.
https://github.com/GameGen-O/GameGen-O/
Project: GenAI_Agents
GenAI_Agents is a comprehensive resource offering tutorials and implementations on generative AI agents, ranging from basic to advanced.
It aims to help users build intelligent, interactive AI systems, from simple chatbots to complex multi-agent setups.
https://github.com/NirDiamant/GenAI_Agents
Project: AI Youtube Shorts Generator
AI Youtube Shorts Generator is a Python-based tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, find the most interesting parts, and format them for short videos.
The tool is currently in version 0.1, so there may be some bugs.
https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator
Project: AddressCLIP
Developed by the Chinese Academy of Sciences and Alibaba Cloud, AddressCLIP is a large model for street-level geolocation from a single photo.
It is a visual-language model designed to locate an image within a city based on street view photos.
The project proposes an end-to-end framework that solves image address localization through image-text alignment and image-geo matching. Three datasets for image address localization were built, demonstrating strong performance on them.
https://github.com/xsx1001/AddressCLIP
Project: FLUX-Controlnet-Inpainting
Released by Alibaba’s creative team, FLUX-Controlnet-Inpainting is an image restoration project.
It offers the Inpainting ControlNet checkpoint for the FLUX.1-dev model, designed for image inpainting and content generation, optimized for 768x768 resolution inference.
The project is in its alpha version, with future updates planned.
https://github.com/alimama-creative/FLUX-Controlnet-Inpainting
Project: doc-comments-ai
doc-comments-ai is a tool that uses large language models (LLMs) to generate code documentation.
With just a few terminal commands, users can generate documentation using OpenAI or fully local LLMs.
It integrates langchain, treesitter, lama.cpp, and ollama, supporting multiple programming languages and local use without data leakage.