Technically correct ™
Before you get your hopes up: Anyone can download it, but very few will be able to actually run it.
What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.
Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model. Ouch.
Edit: you can try quantizing it. This reduces the amount of memory required per parameter to 4 bits, 2 bits or even 1 bit. As you reduce the size, the performance of the model can suffer. So in the extreme case you might be able to run this in under 64GB of graphics RAM.
Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu.
Frak, just counted and I only have 270gb installed. Approx 40gb more if I install some of the deprecated cards in any spare pcie slots i can find.
405b ain’t running local unless you got a proepr set up is enterpise grade lol
I think 70b is possible but I haven’t find anyone confirming it yet
Also would like to know specs on whoever did it
I regularly run llama3 70b unqantized on two P40s and CPU at like 7tokens/s. It’s usable but not very fast.
This would probably run on a a6000 right?
Edit: nope I think I’m off by an order of magnitude
When the RTX 9090 Ti comes, anyone who can afford it will be able to run it.
Wake me up when it works offline “The Llama 3.1 models are available for download through Meta’s own website and on Hugging Face. They both require providing contact information and agreeing to a license and an acceptable use policy, which means that Meta can technically legally pull the rug out from under your use of Llama 3.1 or its outputs at any time.”
It’s available through ollama already. i am running the 8b model on my little server with it’s 3070 as of right now.
It’s really impressive for a 8b model
Yup, 8GB card
Its my old one from the gaming PC after switching to AMD.
It now serves as my little AI hub and whisper server for home assistant
WAKE UP!
It works offline. When you use with ollama, you don’t have to register or agree to anything.
Once you have downloaded it, it will keep on working, meta can’t shut it down.
I’m running 3.1 8b as we speak via ollama totally offline and gave info to nobody.
Did anyone get 70b to run locally?
If so what, what hardware specs?
Yo this is big. In both that it is momentous, and holy shit that’s a lot of parameters. How many GB is this model?? I’d be able to run it if I had an few extra $10k bills lying around to buy the required hardware.
Kind of petty from Zuck not to roll it out in Europe due to the digital services act… But also kind of weird since it’s open source? What’s stopping anyone from downloading the model and creating a web ui for Europe users?