Avatar

raldone01

raldone01@lemmy.world
Joined
2 posts • 39 comments
Direct message

Each card has 24GB so 48GB vram total. I use ollama it fills whatever vrams is available on both cards and runs the rest on the CPU cores.

permalink
report
parent
reply

My specs because you asked:

CPU: Intel(R) Xeon(R) E5-2699 v3 (72) @ 3.60 GHz
GPU 1: NVIDIA Tesla P40 [Discrete]
GPU 2: NVIDIA Tesla P40 [Discrete]
GPU 3: Matrox Electronics Systems Ltd. MGA G200EH
Memory: 66.75 GiB / 251.75 GiB (27%)
Swap: 75.50 MiB / 40.00 GiB (0%)
permalink
report
parent
reply

What are you asking exactly?

What do you want to run? I assume you have a 24GB GPU and 64GB host RAM?

permalink
report
parent
reply

I regularly run llama3 70b unqantized on two P40s and CPU at like 7tokens/s. It’s usable but not very fast.

permalink
report
parent
reply

True multiple drives speed up reads significantly. As long as the videos are sequential read speeds can be very fast (600MB/s) even on one drive though. Results may vary.

permalink
report
parent
reply

I have a ~40TB HDD array and jellyfin is super fast. Just put the database and cache files on a SSD.

For bulk storage of 4k videos with high bitrates HDDs are way cheaper.

permalink
report
parent
reply

Which os are you running?

Try to partition it with free space at the end and see if it makes a difference.

Try to trim the drive and see if it speeds up again.

Do you use any disk encryption?

permalink
report
reply

Llama3.1 33b would be so cool. It would be a nice middle ground for my machine.

permalink
report
reply

At least on linux rm is very fast

permalink
report
parent
reply

I use tubearchivist. It has a jellyfin addon but it could really use some improvements on how it exposes the videos.

permalink
report
parent
reply