Model Dissection: Exploring the Inside of a Model

Explore the inner workings of Transformer models, understand the storage and function of model files, and learn about the roles of weights and biases. Dive into model visualization and capacity insigh

Meng Li

Jul 31, 2024

∙ Paid

Welcome to the "Practical Application of AI Large Language Model Systems" Series

Meng Li

Jun 7

Read full story

Last class, we manually implemented a Transformer model. The final trained model had around 120 million parameters and a file size of about 505MB. In this lesson, we'll explore an intriguing question: what exactly is stored inside this 505MB file?

Do you remember running Qwen2-7B locally a while ago?

The 7B model files are divided into 8 parts, and some versions have 5 files. Combined, these files are about 20GB. The 130B model files, which are its predecessor, total nearly 240GB. If you've wondered what's inside these large model files, you're not alone. When I first encountered large language models, I was very curious about this too.

Through continuous study, I have gained some understanding. Let's share this knowledge today.

AI Disruption

Table of Contents

AI Disruption

Model Dissection: Exploring the Inside of a Model

Explore the inner workings of Transformer models, understand the storage and function of model files, and learn about the roles of weights and biases. Dive into model visualization and capacity insigh

Table of Contents

This post is for paid subscribers