The Llama 3 Herd of Models: A New Era for AI That Anyone Can Use
Meta has given us a clear, technical plan for making world-class foundation models with the release of the "Llama 3 Herd of Models" research paper. This isn't just a technical report; it's a manifesto for the open-source AI movement. It shows that open models can compete with and even beat the most advanced proprietary systems, like GPT-4.
The "Herd" is a group of different Llama 3 models, with the huge 405B parameter flagship model at the top. This model is a big step forward in scaling laws and dense Transformer architectures.
1. Scaling to the Max: Data and Compute
The main idea behind Llama 3 was simple but big: make everything bigger. Meta changed its way of thinking from "chinchilla-optimal", which means a certain ratio of data to parameters, to "user-optimal", which means training models long after they reach normal saturation points.
Training Data: Llama 3 was trained on more than 15 trillion tokens, which is a huge jump from the 2 trillion tokens used to train Llama 2. We put a lot of thought into the quality of this dataset, which includes a lot of code and data in languages other than English to make it easier to work with multiple languages.
Compute Power: More than 16,000 NVIDIA H100 GPUs were used to train the 405B model. To handle this, Meta made a custom training stack that maximised throughput and minimised downtime across the cluster.
Shutterstock 2. Improvements to architecture
Llama 3 uses the tried-and-true dense decoder-only transformer architecture, but it has been improved in a few important ways to make it faster and more efficient:
Grouped-Query Attention (GQA): This is used on all model sizes (8B, 70B, and 405B) to speed up inference and lower memory usage during generation.
Expanded Vocabulary: A new tokeniser with a vocabulary of 128,000 tokens makes it easier to encode text, which makes it work better in languages other than English and in technical fields.
Context Window: The models can handle an initial context length of 8K to 128K, which means that they can process whole documents or complicated codebases in one prompt.
3. After Training: The Art of Alignment
The "Herd of Models" paper talks a lot about post-training, which is the step that turns a raw base model into a useful assistant. Meta used a complex pipeline that included:
SFT (Supervised Fine-Tuning): Using examples that have been carefully chosen by people.
Direct Preference Optimisation (DPO): A way to make model outputs match what people want without the trouble of traditional RLHF.
Ironically, Meta used the Llama 3 405B model to make high-quality synthetic data to train the smaller 8B and 70B versions. This is called "distillation".
4. Safety and Duty
Along with the models, Meta also made a full safety ecosystem. This includes Code Shield, Llama Guard, and Prompt Guard. Meta lets developers build "defence-in-depth" into their apps by giving them the weights of these safety models. This makes sure that the 405B model's power can't be easily used for bad purposes.
What the "Herd" Means
The Llama 3 paper tells the AI community how to do things. Meta has made it easier for researchers all over the world to get started by sharing the details of their failures, data cleaning methods, and hardware optimisations.
The 405B model is a "teacher" model that lets the open-source community make smaller, more specialised models that do much better than their size would suggest.
"OpenAI has reached a turning point with the release of Llama 3 405B. The community now has access to a model that is as good as the best in the world, which will lead to a new wave of innovation.
Written by M Rousol
Senior Editor at AIUPDATE. Passionate about uncovering the stories that shape our world. Follow along for deep dives into technology, culture, and design.
View ProfileEnjoying this article?
Our independent journalism is made possible by readers like you. If you found this story valuable, please consider supporting us.
Discussion
Join the chatLog in to comment
Join the community discussion and share your thoughts.
Sign In / RegisterNo comments yet. Be the first to start the conversation!