Today's Open Source (2024-08-21): Microsoft Releases Phi-3.5 Models with 128K Context
Discover the latest AI open-source models: Microsoft's Phi-3.5 series, Salesforce's xGen-MM, RD-Agent, awesome-digital-human-live2d, Formatron, and Whisperfile.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Phi-3.5
Microsoft recently launched the Phi-3.5 model series, including Phi-3.5 Mini Instruct, Phi-3.5 Vision Instruct, and Phi-3.5 MoE, all supporting 128K context.
Phi-3.5 Mini Instruct has 3.8B parameters and is designed for mobile devices.
Phi-3.5 Vision Instruct is a multimodal model with 4.2B parameters, ideal for image understanding and video summarization.
Phi-3.5 MoE has 16x3.8B parameters with 6.6B active parameters, trained on 4.9T tokens, showcasing strong code and math comprehension.
https://huggingface.co/microsoft/Phi-3.5-mini-instruct
https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
https://huggingface.co/microsoft/Phi-3.5-vision-instruct
Project: xGen-MM
xGen-MM is a new series of foundational large multimodal models (LMMs) developed by Salesforce AI Research.
Built on the successful BLIP series, these models offer stronger and superior foundations.
They are trained on high-quality image captioning datasets and interleaved image-text data, excelling in various vision-language tasks and achieving competitive benchmarks.
https://github.com/salesforce/LAVIS/tree/xgen-mm
https://arxiv.org/abs/2408.08872
https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5
Project: RD-Agent
RD-Agent is an open-source tool for automating high-value general R&D processes using data-driven methods.
It focuses on data and model development to boost productivity.
RD-Agent automates data and model suggestions, extracting knowledge from data.
https://github.com/microsoft/RD-Agent
Project: AWESOME-DIGITAL-HUMAN
awesome-digital-human-live2d is a project to create warm, digital human experiences.
It supports quick deployment via Docker, integrates ASR, LLM, TTS, and modular Agent expansions, and allows Live2D model extension and control.
The project is accessible via the web on PC and mobile, using React and Next.js for the front end and FastAPI for the back end.
https://github.com/wan-h/awesome-digital-human-live2d
Project: Formatron
Formatron lets users control the output format of language models with minimal effort.
It’s lightweight, user-friendly, and seamlessly integrates into existing codebases and frameworks.
https://github.com/Dan-wanna-M/formatron
Project: Whisperfile
Whisperfile is a high-performance implementation of OpenAI's Whisper within the llamafile project, based on the whisper.cpp software by Georgi Gerganov and others.
Whisperfile packages the model into executable weight files, supporting Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD on AMD64 and ARM64 architectures.