Model Dissection: Exploring the Inside of a Model
Explore the inner workings of Transformer models, understand the storage and function of model files, and learn about the roles of weights and biases. Dive into model visualization and capacity insigh
Welcome to the "Practical Application of AI Large Language Model Systems" Series
Last class, we manually implemented a Transformer model. The final trained model had around 120 million parameters and a file size of about 505MB. In this lesson, we'll explore an intriguing question: what exactly is stored inside this 505MB file?
Do you remember running Qwen2-7B locally a while ago?
The 7B model files are divided into 8 parts, and some versions have 5 files. Combined, these files are about 20GB. The 130B model files, which are its predecessor, total nearly 240GB. If you've wondered what's inside these large model files, you're not alone. When I first encountered large language models, I was very curious about this too.
Through continuous study, I have gained some understanding. Let's share this knowledge today.