Unsloth Fine-Tunes Llama3-8B: 44% Faster, 43% Memory Savings, Only 7.75GB
Unsloth boosts Llama3-8B training speed by 44.35%, reduces memory usage by 42.58%, needing only 7.75GB VRAM. Integrate with Firefly for efficient training.
This article introduces Unsloth, which significantly improves the training speed of large models and reduces memory usage. We have integrated it into the Firefly training framework to reduce costs and increase speed for training models like Llama3, Llama2, Mistral, Gemma, and Zephyr.
We tested the training benefits of Unsloth on Llama3-8B using QLoRA. It required only 7.75GB of memory, allowing training on a single 1080Ti card, further lowering the hardware requirements for large model training.
With Unsloth enabled, Llama3-8B's training speed increased by 44.35%, training time decreased by 30.72%, and memory usage reduced by 42.58%. Detailed test settings are in section three.
Unsloth is an open-source large model training acceleration project. It uses OpenAI's Triton to rewrite the model's computation process, greatly increasing training speed and reducing memory usage.
Unsloth ensures the consistency of rewritten model computations, with no approximate calculations and zero loss of training precision.
Unsloth supports most mainstream GPU devices, including V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, and L40, and supports training acceleration and efficient memory management for LoRA and QLoRA. It also supports Flash Attention.
Keep reading with a 7-day free trial
Subscribe to AI Disruption to keep reading this post and get 7 days of free access to the full post archives.