Today's Open Source (2024-10-30): SD 3.5 Medium Open Source Release
Explore SD 3.5 Medium's text-to-image generation, Eliza's multi-agent framework, and cutting-edge AI orchestration tools like Dynamiq and HyperCloning.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: SD3.5 Medium
Stable Diffusion 3.5 Medium is a multimodal diffusion Transformer model focused on text-to-image generation.
This model shows significant improvements in image quality, typography, complex prompt comprehension, and resource efficiency.
With 2.5 billion parameters, it utilizes the improved MMDiT-X architecture and training methods to generate images with resolutions ranging from 0.25 to 2 megapixels.
https://github.com/Stability-AI/sd3.5
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium
Project: Eliza
Eliza is a multi-agent simulation framework supporting conversational agents on Twitter and Discord platforms.
The project allows users to add multiple unique characters and provides full Discord and Twitter connectors, including support for Discord voice channels.
Eliza features RAG memory for conversations and documents, enabling it to read links and PDFs, transcribe audio and video, and summarize conversations.
Highly scalable, users can create their own actions and clients to extend Eliza's functionality.
By default, it supports the Nous Hermes Llama 3.1B model and OpenAI for cloud inference.
https://github.com/ai16z/eliza
Project: Dynamiq
Dynamiq is an orchestration framework for agents and large language model (LLM) applications, aimed at simplifying the development of AI-powered applications.
It focuses on retrieval-augmented generation (RAG) and orchestration of LLM agents, providing an all-in-one generative AI solution.
https://github.com/dynamiq-ai/dynamiq
Project: VLMEvalKit
VLMEvalKit is an open-source evaluation toolkit designed for large-scale vision-language models (LVLMs).
It supports the evaluation of around 100 vision-language models and covers over 40 benchmarks.
The toolkit simplifies the heavy data preparation work across multiple repositories using generative evaluation methods and provides assessment results based on exact match and LLM answer extraction.
https://github.com/open-compass/VLMEvalKit
Project: MMIE
MMIE is a large-scale multimodal interleaved understanding evaluation benchmark designed for large vision-language models (LVLMs).
The project offers a robust framework for assessing the interleaved understanding and generation capabilities of LVLMs across different domains and supports reliable automated metrics.
https://github.com/Lillianwei-h/MMIE
Project: HyperCloning
HyperCloning is a software project aimed at accelerating the pre-training of large language models through small model initialization.
This project transfers the knowledge of small pre-trained language models to larger ones and improves the accuracy of large models through fine-tuning.