Open Source Today (2024-09-12): Mistral AI Unveils Pixtral 12B
Explore top AI projects like Pixtral 12B, LLaMA-Omni, Solar Pro, SciAgents, and more. Discover multimodal models and advanced language processing tools.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Pixtral
Pixtral 12B is a multimodal model released by Mistral AI. It supports both image and text processing with 12 billion parameters, around 24GB in size.
The model is based on the Nemo 12B text model and can handle images of any size with a 128k context window.
Users can send images or URLs along with text to process messages. The model's checkpoint is uploaded by the community and supports various machine learning and deep learning tasks.
It has a vocabulary of 131,072 tokens for detailed language understanding and generation.
https://huggingface.co/mistral-community/pixtral-12b-240910
Project: LLaMA-Omni
LLaMA-Omni is a low-latency, high-quality end-to-end speech interaction model based on Llama-3.1-8B-Instruct, designed to achieve GPT-4o level speech capabilities.
This model supports low-latency voice interaction, generating both text and voice responses, and is suitable for various speech command scenarios.
https://github.com/ictnlp/LLaMA-Omni
Project: Solar Pro (Preview) Instruct - 22B
Solar Pro Preview is an advanced large language model (LLM) with 22 billion parameters, optimized to run on a single GPU.
Its performance surpasses many models with fewer than 30 billion parameters and rivals models with over three times the number of parameters, like Llama 3.1 (70 billion).
Solar Pro Preview uses an enhanced deep upscaling method to scale the 1.4B-parameter Phi-3 medium model to 22 billion, optimizing performance on GPUs with 80GB VRAM.
https://huggingface.co/upstage/solar-pro-preview-instruct
Project: SciAgents
SciAgents is a project aimed at automating scientific discoveries through multi-agent intelligent graph reasoning.
It uses a large ontological knowledge graph to connect and organize scientific concepts, combined with large language models, data retrieval tools, and multi-agent systems with in-the-field learning abilities.
In bio-inspired materials, SciAgents uncovers cross-disciplinary relationships that were previously considered unrelated, going beyond traditional human-driven research in scale, accuracy, and exploration.
The framework autonomously generates and refines research hypotheses, uncovering mechanisms, design principles, and unexpected material properties.
https://github.com/lamm-mit/SciAgentsDiscovery
Project: finetune-Qwen2-VL
finetune-Qwen2-VL is a project for fine-tuning the Qwen2-VL multimodal model.
Qwen2-VL, developed by the Tongyi Qianwen team, comes in 2B, 7B, and 72B versions.
This project offers simple fine-tuning code, supporting both single GPU and multi-GPU training, to help users quickly start fine-tuning the Qwen2-VL model.
https://github.com/zhangfaen/finetune-Qwen2-VL
Project: Reader-LM
Jina AI recently launched Reader-LM, a set of new small language models designed to convert raw HTML into clean Markdown.
These models include reader-lm-0.5b and reader-lm-1.5b, supporting multiple languages and handling up to 256K context.