Today's Open Source (2024-10-12): SJTU Releases libcom, All-in-One Image Composition Toolbox!
Explore top AI open-source projects like libcom for image composition, MLE-Bench for benchmarking AI agents, and Swarm for multi-agent coordination.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: libcom
libcom is an image composition toolbox designed to insert a foreground into a background image for realistic composite images by addressing inconsistencies in appearance, geometry, and semantics between the foreground and background.
libcom covers various tasks related to image composition, including image blending, standard/artistic image harmonization, shadow generation, object placement, generative composition, quality evaluation, and more.
For each task, libcom integrates one or two methods that balance efficiency and effectiveness and is continuously updated as better approaches become available.
https://github.com/bcmi/libcom
Project: MLE-Bench
MLE-Bench is a benchmarking tool for evaluating the performance of AI agents in machine learning engineering.
The project provides a comprehensive set of tools and datasets to test and assess the AI systems' capabilities in handling machine learning tasks.
By utilizing 75 Kaggle competition datasets, MLE-Bench is able to provide a thorough evaluation of AI agents' engineering abilities.
https://github.com/openai/mle-bench/
Project: LLM Evaluation Guidebook
The LLM Evaluation Guidebook is a guide on how to evaluate large language models (LLMs).
It offers multiple methods for assessing models, guides users in designing their own evaluations, and shares tips and tricks from practical experience.
Whether you're a developer of production models, a researcher, or an enthusiast, you'll find the information you need.
https://github.com/huggingface/evaluation-guidebook
Project: lm.rs
lm.rs is a minimal language model inference project written in Rust, designed to run locally on CPUs.
Inspired by Karpathy’s llama2.c and llm.c, it aims to perform complete language model inference without relying on machine learning libraries.
Currently, it supports the multimodal PHI-3.5-vision model and the text-only PHI-3.5-mini model. The project also supports Llama 3.2 and Gemma 2 models, and provides quantized models for optimized performance.
https://github.com/samuel-vitorino/lm.rs
Project: CogVideoX Factory
CogVideoX Factory is a memory-optimized fine-tuning script library for the video generation model CogVideoX.
The project uses TorchAO and DeepSpeed technologies to support fine-tuning of video models on 24GB GPU memory.
Users can leverage this project to perform text-to-video and image-to-video generation tasks, with various fine-tuning and training scripts available to accommodate different generation needs.
https://github.com/a-r-r-o-w/cogvideox-factory
Project: Swarm
Swarm is an experimental framework managed by the OpenAI Solutions team, designed to build, coordinate, and deploy multi-agent systems.
The framework implements lightweight, highly controllable, and easily testable agent coordination and execution through primitives for agent orchestration and handoffs.
Swarm is primarily used to demonstrate handoff and routine patterns explored in "Orchestrating Agents: Handoffs & Routines," suitable for scenarios that require handling a large number of independent functions and instructions.