Today's Open Source (2024-08-30): Alibaba Qwen2-VL, 2B & 7B Models, Long Video & Multi-Resolution Support

Discover the latest AI open-source models like Qwen2-VL by Alibaba and xLAM by Salesforce, designed for advanced video comprehension, function calling, and more. Explore now!

Aug 30, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Qwen2-VL

Qwen2-VL is a new multimodal large language model series from Alibaba Cloud's Qwen team, featuring 2B and 7B parameter versions, with a 72B version soon to be open-sourced.

This model excels in image and video understanding, supports multiple languages, and can be integrated with devices like mobile phones and robots for automated operations.

Qwen2-VL can comprehend videos longer than 20 minutes through high-quality video Q&A, dialogue, and content creation. Unlike previous models, Qwen2-VL handles any image resolution, offering a more human-like visual processing experience.

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-AWQ

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-AWQ

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

https://github.com/QwenLM/Qwen2-VL

Project: xLAM - 7B

Salesforce has released xLAM - 7B, 8x7B, and 8x22B, with context lengths up to 64K, designed for AI Agents.

xLAM is a large language model series developed by Salesforce, focusing on function-calling capabilities. It previously released 1B and 7B versions supporting 16K context length.

This model enhances decision-making, turning user intent into executable actions, suitable for automating workflows in various fields. It's optimized for efficient deployment on personal devices, supporting offline use and improved privacy.

https://huggingface.co/Salesforce/xLAM-8x22b-r

https://huggingface.co/Salesforce/xLAM-8x7b-r

https://huggingface.co/Salesforce/xLAM-1b-fc-r

https://huggingface.co/Salesforce/xLAM-7b-fc-r

https://huggingface.co/Salesforce/xLAM-7b-r

Project: ChatLearn

ChatLearn is a flexible and efficient large-scale alignment training framework.

It offers a user-friendly programming interface and supports multiple distributed acceleration engines and parallel strategies, significantly boosting training performance.

ChatLearn is designed for researchers and practitioners needing large-scale alignment training. It supports various alignment training methods like RLHF, DPO, OnlineDPO, and GRPO, allowing users to customize model execution flows.

https://github.com/alibaba/ChatLearn

Project: NanoFlow

NanoFlow is a high-performance LLM service framework focused on throughput.

Using key technologies like in-device parallelism, asynchronous CPU scheduling, and SSD offloading, NanoFlow significantly outperforms vLLM, Deepspeed-FastGen, and TensorRT-LLM in throughput.

Comprehensive evaluations show that NanoFlow can boost throughput by up to 1.91 times across various models and hardware configurations.

https://github.com/efeslab/Nanoflow

Project: IPA

Interactive PDF Analysis (IPA) is a graphical tool for deep analysis of PDF files. It allows researchers to explore the internal details of PDFs, extracting and analyzing metadata to identify the document's creator, creation date, modification history, and other critical information.

https://github.com/seekbytes/IPA

Project: Docmatix

Docmatix is an open-source dataset for visual question answering, containing both image and text data.

This dataset is suitable for document-based question-answering tasks, supporting multiple formats and languages, mainly used for researching and developing visual question-answering systems.

https://huggingface.co/datasets/HuggingFaceM4/Docmatix

Today's Open Source (2024-08-29): AI Chat with LLM Routing; Anthropic Prompt Tutorials

Meng Li

Aug 29

Today's Open Source (2024-08-29): AI Chat with LLM Routing; Anthropic Prompt Tutorials

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Today's Open Source (2024-08-29): AI Chat with LLM Routing; Anthropic Prompt Tutorials

Discussion about this post