List of icons/services suggested:

  • Calibre
  • Jitsi
  • Kiwix
  • Monero (Node)
  • Nextcloud
  • Pihole
  • Ollama (Should at least be able to run tiny-llama 1.1B)
  • Open Media Vault
  • Syncthing
  • VLC Media Player Media Server
You are viewing a single thread.
View all comments View context
2 points

Heres a tip, most software has the models default context size set at 512, 2048, or 4092. Part of what makes llama 3.1 so special is that it was trained with 128k context so bump that up to 131072 in the settings so it isnt recalculating context every few minutes…

Some caveats, this massively increases memory usage (unless you quantize the cache with FA) and it also massively slows down CPU generation once the context gets long.

TBH you just need to not keep a long chat history unless you need it,.

permalink
report
parent
reply
1 point
*

Thank you thats useful to know. In your opinion what context size is the sweet spot for llama 3.1 8B and similar models?

permalink
report
parent
reply
1 point
*

Oh I got you mixed up with the other commenter, apologies.

I’m not sure when llama 8b starts to degrade at long context, but I wanna say its well before 128K, and where other “long context” models start to look much more attractive depending on the task. Right now I am testing Amazon’s mistral finetune, and it seems to be much better than Nemo or llama 3.1 out there.

permalink
report
parent
reply
1 point

4 core i7, 16gb RAM and no GPU yet

Honestly as small as you can manage.

Again, you will get much better speeds out of “extreme” MoE models like deepseek chat lite: https://huggingface.co/YorkieOH10/DeepSeek-V2-Lite-Chat-Q4_K_M-GGUF/tree/main

Another thing I’d recommend is running kobold.cpp instead of ollama if you want to get into the nitty gritty of llms. Its more customizable and (ultimately) faster on more hardware.

permalink
report
parent
reply
1 point
*

Thats good info for low spec laptops. Thanks for the software recommendation. Need to do some more research on the model you suggested. I think you confused me for the other guy though. Im currently working with a six core ryzen 2600 CPU and a RX 580 GPU. edit- no worries we are good it was still great info for the thinkpad users!

permalink
report
parent
reply

linuxmemes

!linuxmemes@lemmy.world

Create post

I use Arch btw


Sister communities:
Community rules
  1. Follow the site-wide rules and code of conduct
  2. Be civil
  3. Post Linux-related content
  4. No recent reposts

Please report posts and comments that break these rules!

Community stats

  • 7.5K

    Monthly active users

  • 913

    Posts

  • 16K

    Comments