home.social

#koboldcpp — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #koboldcpp, aggregated by home.social.

  1. Henky!! @npub1600yr4qg5vcfp7svf6ysj0008tn7aphnu0gjs6lw5hjn74n0laasjx889v@momostr.pink ·
    Were finally able to add music generation to #KoboldCpp ! And since it reuses all the large backend code from ggml it won't bloat us down adding it. Don't like music and just want your LLM backend to stay unbloated? No problem, don't load a music model and it won't impact you.

    But if you like music gen what we are integrating can be very fast and produce really fun results. Been having a blast with ace-step 1.5 since it released and soon you can enjoy it inside KoboldCpp to.

    And of course for those who don't yet know its an #ai program that lets you run #llm 's locally alongside #imagegen and more! But with a portable binary that is small for what it can do.
  2. 紀錄一下用 #koboldcpp 進行的 #LLM 性能測試

    Llama 3 8B模型,IQ4_XS 量化
    Flags: NoAVX2=False Threads=7 HighPriority=False Cublas_Args=None Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=True KvCache=2
    Timestamp: 2025-05-26 07:36:07.196685+00:00
    Backend: koboldcpp_vulkan.so
    Layers: 49
    Model: L3-8B-Stheno-v3.2-NEO-V1-D_AU-IQ4_XS-imat13
    MaxCtx: 8192
    GenAmount: 100
    -----
    ProcessingTime: 55.082s
    ProcessingSpeed: 146.91T/s
    GenerationTime: 8.004s
    GenerationSpeed: 12.49T/s
    TotalTime: 63.086s

    然後是 Mistral Nemo Small 12B,也是 IQ4_XS 量化
    Flags: NoAVX2=False Threads=7 HighPriority=False Cublas_Args=None Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=True KvCache=2
    Timestamp: 2025-05-26 07:43:55.416620+00:00
    Backend: koboldcpp_vulkan.so
    Layers: 49
    Model: MN-GRAND-Gutenburg-Lyra4-Lyra-12B-DARKNESS-D_AU-IQ4_XS
    MaxCtx: 8192
    GenAmount: 100
    -----
    ProcessingTime: 80.623s
    ProcessingSpeed: 100.37T/s
    GenerationTime: 11.601s
    GenerationSpeed: 8.62T/s
    TotalTime: 92.224s

    硬體規格
    GMKtec K8 Plus
    Ryzen 7 7840HS /w Radeon 780M 8G Vram
    64GB DDR5-5600 Dual channel
    OS: Proxmox VE 8.2

  3. I want to do a good write up in my README for PixelPolygot as one of my last touches but I need the damn #rocm fork of #KoboldCpp to update so I can do some more testing with Qwen2.5-VL locally. Like it works with vulkan on the main branch but way slower than rocm. :notlikeblob:
  4. Tried a roleplay with an LLM which more or less turned into a story. I actually quite like it. Could have more action, though

    Not sure if chat mode was the right choice.
    Also, still no idea how to do multi character chats.

    nc.uvokchee.de/s/7deWiWPAmB63K

    #koboldcpp #koboldai #llm

  5. Yeah, this is what a mean by "the model gets confused".
    Apparently KoboldAI/cpp can't distinguish the stop words anymore.

    The initial prompt was

    User: This is a roleplay between two animal characters, which I will define later.
    User: Please generate and describe an animal character, which you will play. Choose a mammal. Please prefix your messages by a newline and your characters name, like so: '
    Bob: waves "Hi, how are you?"'.

    #koboldai #koboldcpp

  6. And apparently, you can't have two characters as "you"...

    #koboldai #koboldcpp

  7. Hm, I'm unsure whether I should instruct the model to reply with their characters name prefixed.
    I have no idea how much "automagic" the chat mode has inbuilt.
    I actually wanna get away from only seeing "KoboldAI" and "user".

    Also, changing "my name" in the chat mode settings seems to confuse the model?

    #koboldai #koboldcpp

  8. #koboldcpp has a nice interface.
    I kind wish they had a better explanation what's the difference between "chat" and "KoboldGPT chat".

    Also, I'm simply overwhelmed by the amount of models. Each of them needs to be treated / prompted differently, it seems.

    And then I still have problems with the model generating my character's actions / speech occasionally, and I don't know whether that's due to my insufficient prompting, or due to this "end token" (e.g. <|im_end|>) not being configured correctly, which seems to differ for some models again.

    #llm #ai

  9. #koboldcpp 以前景模式執行,打開 AI Horde 的任務分派程式,很明顯可以感覺到早上的使用者明顯比下午多。然後 RP 跟 ERP 的比例大約是 1:4。

    #LLM

  10. Как запустить Mixtral на своём компьютере

    Всякий раз, когда выходит новая хорошая ИИ модель, Хабр наполняется вопросами "Как нам её попробовать" и неправильными ответами, будто нужно платить за какие-то сервисы или иметь железа на сто лямов. Поэтому я вновь напишу инструкцию, как запустить новейший mixtral-8x7 на обычных средних компьютерах.

    habr.com/ru/articles/781702/

    #LLM #Mixtral #KoboldCPP #GGUF #18+