brucethemoose
The problem is that splitting models up over a network, even over LAN, is not super efficient. The entire weights need to be run through for every half word.
And the other problem is that petals just can’t keep up with the crazy dev pace of the LLM community. Honestly they should dump it and fork or contribute to llama.cpp or exllama, as TBH no one wants to split up LLAMA 2 (or even llama 3) 70B, and be a generation or two behind for a base instruct model instead of a finetune.
Even the horde has very few hosts relative to users, even though hosting a small model on a 6GB GPU would get you lots of karma.
The diffusion community is very different, as the output is one image and even the largest open models are much smaller. Lora usage is also standardized there, while it is not on LLM land.
If they silently ignores this (as they seem to be doing?) it just screams “have your cake and eat it,” in regards to whatever WotC imposed on them.
Technically they did not violate the contract. Maybe.
What? You want us to fix this, WotC? Well, you see, that would be quite expensive…
Facebook just didn’t release the code for llama imagegen.
The model you are looking for now is Flux.