Meta has released and open-sourced Llama 3.1 in three different sizes: 8B, 70B, and 405B

This new Llama iteration and update brings state-of-the-art performance to open-source ecosystems.

If you’ve had a chance to use Llama 3.1 in any of its variants - let us know how you like it and what you’re using it for in the comments below!

Llama 3.1 Megathread

For this release, we evaluated performance on over 150 benchmark datasets that span a wide range of languages. In addition, we performed extensive human evaluations that compare Llama 3.1 with competing models in real-world scenarios. Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Additionally, our smaller models are competitive with closed and open models that have a similar number of parameters.

As our largest model yet, training Llama 3.1 405B on over 15 trillion tokens was a major challenge. To enable training runs at this scale and achieve the results we have in a reasonable amount of time, we significantly optimized our full training stack and pushed our model training to over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this scale.


Official Meta News & Documentation

See also: The Llama 3 Herd of Models paper here:


HuggingFace Download Links

8B

Meta-Llama-3.1-8B

Meta-Llama-3.1-8B-Instruct

Llama-Guard-3-8B

Llama-Guard-3-8B-INT8


70B

Meta-Llama-3.1-70B

Meta-Llama-3.1-70B-Instruct


405B

Meta-Llama-3.1-405B-FP8

Meta-Llama-3.1-405B-Instruct-FP8

Meta-Llama-3.1-405B

Meta-Llama-3.1-405B-Instruct


Getting the models

You can download the models directly from Meta or one of our download partners: Hugging Face or Kaggle.

Alternatively, you can work with ecosystem partners to access the models through the services they provide. This approach can be especially useful if you want to work with the Llama 3.1 405B model.

Note: Llama 3.1 405B requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.

Learn more at:


Running the models

Linux

Windows

Mac

Cloud


More guides and resources

How-to Fine-tune Llama 3.1 models

Quantizing Llama 3.1 models

Prompting Llama 3.1 models

Llama 3.1 recipes


YouTube media

Rowan Cheung - Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

Matthew Berman - BREAKING: LLaMA 405b is here! Open-source is now FRONTIER!

Wes Roth - Zuckerberg goes SCORCHED EARTH… Llama 3.1 BREAKS the “AGI Industry”*

1littlecoder - How to DOWNLOAD Llama 3.1 LLMs

Bloomberg - Inside Mark Zuckerberg’s AI Era | The Circuit

15 points

super exciting, but in a way i have kind of “lost interest” in frontier models, since the resources needed to run them is beyond what most people have access to. i mostly see the future in smaller models (like 3.1 8B for example), anyone else share this feeling?

also unrelated but, i was previously librecat on here (my last instance stopped working)

permalink
report
reply
3 points

I tried downloading and running the 405B locally through LM-Studio. Got an error message saying invalid tokenizer. Then tried it with ollama. That didn’t work either. Going to try the 70B tomorrow.

Not sure it’s possible to run the larger ones on a Mac laptop.

permalink
report
reply
4 points

There’s apparently some tuning that needs to be done in Llama.cpp (which LM Studio uses to run) so Llama 3.1 can work properly: https://github.com/ggerganov/llama.cpp/issues/8650

permalink
report
parent
reply
3 points

Thank you. Looks like I’m not alone and people are doing more detailed testing. I’ll just wait till the dust settles.

permalink
report
parent
reply
1 point

Does anyone know how the base (/foundation) model works? Up until now they always released one instruction tuned variant and one base model. Is it the same for the 405B model? And if yes, does that base model refuse to do things? Because I read some people claiming the new Llama 3.1 is more restricted than the versions before. But this shouldn’t apply to a base model. It’s just the instruct-tuned variants that are aligned to some “guardrails”. I’m confused. Do people use the wrong model? Or has something changed?

permalink
report
reply
2 points

IMO guardrails have been irrelevant for “local” models forever since a little prompt engineering or manipulation blows them away,.

In theory the base model should be less “censored,” but really its just for raw completion/continuation and further finetuning.

permalink
report
parent
reply
1 point
*

But it’s super annoying when doing storywriting or using it as an agent. And then you have to do detection and extra handling of refusals, circumvent them and write extra prompts. And I think I read some paper that jailbreaking and removing “censorship” tends to make the models a bit stupider. I think in general it’s way more clever to take a model without guardrails and fine-tune it, than to put them in place and then remove them again, degrade the model in the process and also make your life harder. A base model should be entirely without any censorship. (It’s a base model though. It obviously won’t follow instructions or answer questions… It’s the basis for the community to take and fine-tune, aligned with our vision of baked-in ethics or the lack thereof.)

permalink
report
parent
reply
1 point

Yeah, well, I have been using base models and a few instruct tunes for a bit and haven’t even gotten refusals, as long as there as enough existing context.

permalink
report
parent
reply
1 point

Kind of petty from Zuck not to roll it out in Europe due to the digital services act… But also kind of weird since it’s open source? What’s stopping anyone from downloading the model and creating a web ui for Europe users?

permalink
report
reply

Free Open-Source Artificial Intelligence

!fosai@lemmy.world

Create post

Community stats

  • 65

    Monthly active users

  • 68

    Posts

  • 39

    Comments