Today's Open Source (2024-08-30): Alibaba Qwen2-VL, 2B & 7B Models, Long Video & Multi-Resolution Support
Discover the latest AI open-source models like Qwen2-VL by Alibaba and xLAM by Salesforce, designed for advanced video comprehension, function calling, and more. Explore now!
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Qwen2-VL
Qwen2-VL is a new multimodal large language model series from Alibaba Cloud's Qwen team, featuring 2B and 7B parameter versions, with a 72B version soon to be open-sourced.
This model excels in image and video understanding, supports multiple languages, and can be integrated with devices like mobile phones and robots for automated operations.
Qwen2-VL can comprehend videos longer than 20 minutes through high-quality video Q&A, dialogue, and content creation. Unlike previous models, Qwen2-VL handles any image resolution, offering a more human-like visual processing experience.
https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-AWQ
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-AWQ
https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
https://github.com/QwenLM/Qwen2-VL
Project: xLAM - 7B
Salesforce has released xLAM - 7B, 8x7B, and 8x22B, with context lengths up to 64K, designed for AI Agents.
xLAM is a large language model series developed by Salesforce, focusing on function-calling capabilities. It previously released 1B and 7B versions supporting 16K context length.
This model enhances decision-making, turning user intent into executable actions, suitable for automating workflows in various fields. It's optimized for efficient deployment on personal devices, supporting offline use and improved privacy.
https://huggingface.co/Salesforce/xLAM-8x22b-r
https://huggingface.co/Salesforce/xLAM-8x7b-r
https://huggingface.co/Salesforce/xLAM-1b-fc-r
https://huggingface.co/Salesforce/xLAM-7b-fc-r
https://huggingface.co/Salesforce/xLAM-7b-r
Project: ChatLearn
ChatLearn is a flexible and efficient large-scale alignment training framework.
It offers a user-friendly programming interface and supports multiple distributed acceleration engines and parallel strategies, significantly boosting training performance.
ChatLearn is designed for researchers and practitioners needing large-scale alignment training. It supports various alignment training methods like RLHF, DPO, OnlineDPO, and GRPO, allowing users to customize model execution flows.
https://github.com/alibaba/ChatLearn
Project: NanoFlow
NanoFlow is a high-performance LLM service framework focused on throughput.
Using key technologies like in-device parallelism, asynchronous CPU scheduling, and SSD offloading, NanoFlow significantly outperforms vLLM, Deepspeed-FastGen, and TensorRT-LLM in throughput.
Comprehensive evaluations show that NanoFlow can boost throughput by up to 1.91 times across various models and hardware configurations.
https://github.com/efeslab/Nanoflow
Project: IPA
Interactive PDF Analysis (IPA) is a graphical tool for deep analysis of PDF files. It allows researchers to explore the internal details of PDFs, extracting and analyzing metadata to identify the document's creator, creation date, modification history, and other critical information.
https://github.com/seekbytes/IPA
Project: Docmatix
Docmatix is an open-source dataset for visual question answering, containing both image and text data.
This dataset is suitable for document-based question-answering tasks, supporting multiple formats and languages, mainly used for researching and developing visual question-answering systems.