home.social

#llama_cpp — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llama_cpp, aggregated by home.social.

  1. #MistralSmall24B-Instruct is a really nice model to run locally for Coding Advice, Summarizing or Creative Writing.

    With a recent #llama_cpp on a #GeForce #RTX4090 at Q8, the 24GB VRAM is tightly maxed out and I am seeing text generation at 7-9 token/s.

    huggingface.co/mistralai/Mistr