Maybe even 32GB if they use newer ICs.

More explanation (and my source of the tip): https://www.pcgamer.com/hardware/graphics-cards/shipping-document-suggests-that-a-24-gb-version-of-intels-arc-b580-graphics-card-could-be-heading-to-market-though-not-for-gaming/

Would be awesome if true, and if it’s affordable. Screw Nvidia (and, inexplicably, AMD) for their VRAM gouging.

You are viewing a single thread.
View all comments View context
5 points
*

In practice, almost no one with A770s uses ipex-llm simply because its not as vram efficient as llama.cpp, isn’t as feature rich, and the PyTorch setup is nightmarish.

Intel is indeed making many contributions to the open source LLM space, but it feels… shotgunish? Not unified at all. AMD, on the other hand, is more focused but woefully understaffed, and Nvidia is laser focused on the enterprise space.

permalink
report
parent
reply
1 point

I don’t have any personal experience with selfhosted LMMs, but I thought that ipex-llm was supposed to be a backend for llama.cpp?
https://yuwentestdocs.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html
Do you have time to elaborate on your experience?

I see your point, they seem to be investing in every and all areas related to AI at the moment. Personally I hope we get a third player in the dgpu segment in the form of Intel ARC and that they successfully breaks the Nvidia CUDA hegemony with their OneAPI:
https://uxlfoundation.org/
https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/introduction

permalink
report
parent
reply
3 points
*

Its complicated.

So there’s Intel’s own project/library, which is the fastest way to run LLMs on their IGPs and GPUs. But also the hardest to set up, and the least feature packed.

There’s more than one Intel compatible llama.cpp ‘backend,’ including the Intel-contribed SYCL one, another PR for the AMX support on CPUs, I think another one branded as ipex-llm, and the vulkan backend that the main llama.cpp devs seem to be focusing on now. The problem is each of these backends have their own bugs, incomplete features, installation quirks, and things they don’t support, while AMD’s rocm kinda “just works” because it inherits almost everything from the CUDA backend.

It’s a hot mess.

Hardcore LLM enthusiasts largely can’t keep up, much less the average person just trying to self-host a model.

OneAPI is basically a nothingburger so far. You can run many popular CUDA libraries on AMD through rocm, right now, but that is not the case with Intel, and no devs are interested in changing that because Intel isn’t selling any “3090 class” GPU hardware worth buying.

permalink
report
parent
reply
2 points
*

That do sound difficult to navigate.
With OpenAPI OneAPI being backed by so many big names, do you think they will be able to upset CUDA in the future or has Nvidia just become too entrenched?
Would a B580 24GB and B770 32GB be able to change that last sentence regarding GPU hardware worth buying?

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 15K

    Monthly active users

  • 6.7K

    Posts

  • 152K

    Comments