Why Are Most Large Models Now Decoder-Only?

Aug 11, 2024

∙ Paid

Since the release of ChatGPT, various large language models (LLMs) have emerged, including Meta's Llama-3, Google's Gemini, and Alibaba's Qianwen.

This raises a question: LLMs are based on the Transformer architecture, so why are most of them decoder-only?

Let's start by reviewing some basic architectural terms.

AI Disruption