First 100% Open-Source MoE Model: 7B Parameters, 1B Inference Cost
Fully open-source MoE model OLMoE offers cutting-edge performance with 7B parameters and 1B inference cost, optimizing training efficiency.
The training code, checkpoints, logs, and data are fully open-sourced.
Although large language models (LMs) have made significant progress, there is still a trade-off between performance and cost in training and inference.
For many researchers and developers, high-performing LMs are hard to access due to their high cost.
One way to improve cost-performance is by using sparse activation with Mixture of Experts (MoE). MoE has several experts at each layer but only activates a subset each time (see Fig. 2).
This makes MoE more efficient than dense models with similar parameters since dense models activate all parameters for each input.
For this reason, cutting-edge models, like Gemini-1.5 and GPT-4, use MoE.
However, most MoE models are closed-source. While some have released model weights, information about training data, code, etc., is often limited or missing.
The lack of open resources and detailed research prevents the building of cost-effective open-source models that could compete with closed-source models.
To address this, researchers from the Allen Institute for AI, Contextual AI, and others introduced OLMoE, a fully open-source MoE language model that achieves state-of-the-art (SOTA) performance among models of similar size.
Keep reading with a 7-day free trial
Subscribe to AI Disruption to keep reading this post and get 7 days of free access to the full post archives.