home.social

#mtp — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #mtp, aggregated by home.social.

  1. New week, more slides: Run LLMs Locally

    Now including wllama to run GGUF models inside your browser!

    wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
    It has no 4 GB limitation and is faster than Transformers.js.

    I also added translations using the HY-MT model from Tencent.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

  2. New week, more slides: Run LLMs Locally

    Now including wllama to run GGUF models inside your browser!

    wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
    It has no 4 GB limitation and is faster than Transformers.js.

    I also added translations using the HY-MT model from Tencent.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

  3. New week, more slides: Run LLMs Locally

    Now including wllama to run GGUF models inside your browser!

    wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
    It has no 4 GB limitation and is faster than Transformers.js.

    I also added translations using the HY-MT model from Tencent.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

  4. New week, more slides: Run LLMs Locally

    Now including wllama to run GGUF models inside your browser!

    wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
    It has no 4 GB limitation and is faster than Transformers.js.

    I also added translations using the HY-MT model from Tencent.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

  5. New week, more slides: Run LLMs Locally

    Now including wllama to run GGUF models inside your browser!

    wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
    It has no 4 GB limitation and is faster than Transformers.js.

    I also added translations using the HY-MT model from Tencent.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

  6. RT @dealignai: TRANSLASION: Qwen3.6-27b und 35b MXFP4 MXFP8 CRACK ist jetzt mit MTP verfügbar. Genieße unzensierte Geschwindigkeit!

    mehr auf Arint.info

    #AI #CRACK #MTP #MXFP4 #MXFP8 #Qwen3 #arint_info

    https://x.com/dealignai/status/2058653981090705676#m

  7. RT @dealignai: TRANSLASION: Qwen3.6-27b und 35b MXFP4 MXFP8 CRACK ist jetzt mit MTP verfügbar. Genieße unzensierte Geschwindigkeit!

    mehr auf Arint.info

    #AI #CRACK #MTP #MXFP4 #MXFP8 #Qwen3 #arint_info

    https://x.com/dealignai/status/2058653981090705676#m

  8. RT @dealignai: TRANSLASION: Qwen3.6-27b und 35b MXFP4 MXFP8 CRACK ist jetzt mit MTP verfügbar. Genieße unzensierte Geschwindigkeit!

    mehr auf Arint.info

    #AI #CRACK #MTP #MXFP4 #MXFP8 #Qwen3 #arint_info

    https://x.com/dealignai/status/2058653981090705676#m

  9. RT @dealignai: TRANSLASION: Qwen3.6-27b und 35b MXFP4 MXFP8 CRACK ist jetzt mit MTP verfügbar. Genieße unzensierte Geschwindigkeit!

    mehr auf Arint.info

    #AI #CRACK #MTP #MXFP4 #MXFP8 #Qwen3 #arint_info

    https://x.com/dealignai/status/2058653981090705676#m

  10. RT @dealignai: TRANSLASION: Qwen3.6-27b und 35b MXFP4 MXFP8 CRACK ist jetzt mit MTP verfügbar. Genieße unzensierte Geschwindigkeit!

    mehr auf Arint.info

    #AI #CRACK #MTP #MXFP4 #MXFP8 #Qwen3 #arint_info

    https://x.com/dealignai/status/2058653981090705676#m

  11. New week, new slides: Run LLMs Locally

    Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-ASR is now working directly with Llama.cpp and included in the slides.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp

  12. New week, new slides: Run LLMs Locally

    Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-ASR is now working directly with Llama.cpp and included in the slides.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp

  13. New week, new slides: Run LLMs Locally

    Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-ASR is now working directly with Llama.cpp and included in the slides.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp

  14. New week, new slides: Run LLMs Locally

    Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-ASR is now working directly with Llama.cpp and included in the slides.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp

  15. New week, new slides: Run LLMs Locally

    Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-ASR is now working directly with Llama.cpp and included in the slides.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp

  16. Qwen3.6 MTP весит на 0.3 Гб больше, а даёт ускорение в ~2 раза. С 60 t/s до 130 t/s для Qwen3.6 27B без искажений

    В llama.cpp добавили поддержку MTP Qwen3.6. Дополнительные слои Multi-Token Prediction позволяют сгенерировать сразу несколько токенов за 1 проход, что ускоряет генерацию в 1.5-2 раза. Качество при этом остается lossless. Для моделей, которые не имеют встроенного MTP, есть альтернативы в лице EAGLE-3 и DFlash.

    habr.com/ru/articles/1036120/

    #искусственный_интеллект #mtp #llamacpp #qwen #qwen36

  17. Qwen3.6 MTP весит на 0.3 Гб больше, а даёт ускорение в ~2 раза. С 60 t/s до 130 t/s для Qwen3.6 27B без искажений

    В llama.cpp добавили поддержку MTP Qwen3.6. Дополнительные слои Multi-Token Prediction позволяют сгенерировать сразу несколько токенов за 1 проход, что ускоряет генерацию в 1.5-2 раза. Качество при этом остается lossless. Для моделей, которые не имеют встроенного MTP, есть альтернативы в лице EAGLE-3 и DFlash.

    habr.com/ru/articles/1036120/

    #искусственный_интеллект #mtp #llamacpp #qwen #qwen36

  18. Qwen3.6 MTP весит на 0.3 Гб больше, а даёт ускорение в ~2 раза. С 60 t/s до 130 t/s для Qwen3.6 27B без искажений

    В llama.cpp добавили поддержку MTP Qwen3.6. Дополнительные слои Multi-Token Prediction позволяют сгенерировать сразу несколько токенов за 1 проход, что ускоряет генерацию в 1.5-2 раза. Качество при этом остается lossless. Для моделей, которые не имеют встроенного MTP, есть альтернативы в лице EAGLE-3 и DFlash.

    habr.com/ru/articles/1036120/

    #искусственный_интеллект #mtp #llamacpp #qwen #qwen36

  19. Qwen3.6 MTP весит на 0.3 Гб больше, а даёт ускорение в ~2 раза. С 60 t/s до 130 t/s для Qwen3.6 27B без искажений

    В llama.cpp добавили поддержку MTP Qwen3.6. Дополнительные слои Multi-Token Prediction позволяют сгенерировать сразу несколько токенов за 1 проход, что ускоряет генерацию в 1.5-2 раза. Качество при этом остается lossless. Для моделей, которые не имеют встроенного MTP, есть альтернативы в лице EAGLE-3 и DFlash.

    habr.com/ru/articles/1036120/

    #искусственный_интеллект #mtp #llamacpp #qwen #qwen36

  20. RT @mr_r0b0t: Wusstest du, dass Qwen3.6 mit nativer MTP ausgeliefert wurde? Ja, dieselbe MTP, für die Google gestern die Unterstützung von Gemma4 freigegeben hat! Multi Token Prediction (MTP) = spekulatives Decoding. Hier ist ein Qwen3.6-Modell, quantisiert auf Q4KM, das MTP über ikllama.cpp unterstützt.

    mehr auf Arint.info

    #AI #Gemma4 #LLM #MTP #Qwen3 #arint_info

    https://x.com/mr_r0b0t/status/2052022017470120067#m

  21. RT @mr_r0b0t: Wusstest du, dass Qwen3.6 mit nativer MTP ausgeliefert wurde? Ja, dieselbe MTP, für die Google gestern die Unterstützung von Gemma4 freigegeben hat! Multi Token Prediction (MTP) = spekulatives Decoding. Hier ist ein Qwen3.6-Modell, quantisiert auf Q4KM, das MTP über ikllama.cpp unterstützt.

    mehr auf Arint.info

    #AI #Gemma4 #LLM #MTP #Qwen3 #arint_info

    https://x.com/mr_r0b0t/status/2052022017470120067#m

  22. RT @mr_r0b0t: Wusstest du, dass Qwen3.6 mit nativer MTP ausgeliefert wurde? Ja, dieselbe MTP, für die Google gestern die Unterstützung von Gemma4 freigegeben hat! Multi Token Prediction (MTP) = spekulatives Decoding. Hier ist ein Qwen3.6-Modell, quantisiert auf Q4KM, das MTP über ikllama.cpp unterstützt.

    mehr auf Arint.info

    #AI #Gemma4 #LLM #MTP #Qwen3 #arint_info

    https://x.com/mr_r0b0t/status/2052022017470120067#m

  23. RT @mr_r0b0t: Wusstest du, dass Qwen3.6 mit nativer MTP ausgeliefert wurde? Ja, dieselbe MTP, für die Google gestern die Unterstützung von Gemma4 freigegeben hat! Multi Token Prediction (MTP) = spekulatives Decoding. Hier ist ein Qwen3.6-Modell, quantisiert auf Q4KM, das MTP über ikllama.cpp unterstützt.

    mehr auf Arint.info

    #AI #Gemma4 #LLM #MTP #Qwen3 #arint_info

    https://x.com/mr_r0b0t/status/2052022017470120067#m

  24. RT @mr_r0b0t: Wusstest du, dass Qwen3.6 mit nativer MTP ausgeliefert wurde? Ja, dieselbe MTP, für die Google gestern die Unterstützung von Gemma4 freigegeben hat! Multi Token Prediction (MTP) = spekulatives Decoding. Hier ist ein Qwen3.6-Modell, quantisiert auf Q4KM, das MTP über ikllama.cpp unterstützt.

    mehr auf Arint.info

    #AI #Gemma4 #LLM #MTP #Qwen3 #arint_info

    https://x.com/mr_r0b0t/status/2052022017470120067#m

  25. anyone who was a teenager or 20-something during the 90's....

    Save the date - June 4th 2026

    Your new anthem will arrive

    #music #Summerin99 #mtp #rockAnthem #1999 #90s

  26. anyone who was a teenager or 20-something during the 90's....

    Save the date - June 4th 2026

    Your new anthem will arrive

    #music #Summerin99 #mtp #rockAnthem #1999 #90s

  27. anyone who was a teenager or 20-something during the 90's....

    Save the date - June 4th 2026

    Your new anthem will arrive

    #music #Summerin99 #mtp #rockAnthem #1999 #90s

  28. anyone who was a teenager or 20-something during the 90's....

    Save the date - June 4th 2026

    Your new anthem will arrive

    #music #Summerin99 #mtp #rockAnthem #1999 #90s

  29. anyone who was a teenager or 20-something during the 90's....

    Save the date - June 4th 2026

    Your new anthem will arrive

    #music #Summerin99 #mtp #rockAnthem #1999 #90s

  30. #MeetTheRepublicans waited until THE FINAL FIVE MINUTES to talk about THE LARGEST POLITICAL PROTEST IN AMERICAN HISTORY. 🤨 #MtP

  31. #MeetTheRepublicans waited until THE FINAL FIVE MINUTES to talk about THE LARGEST POLITICAL PROTEST IN AMERICAN HISTORY. 🤨 #MtP

  32. #MeetTheRepublicans waited until THE FINAL FIVE MINUTES to talk about THE LARGEST POLITICAL PROTEST IN AMERICAN HISTORY. 🤨 #MtP

  33. #MeetTheRepublicans waited until THE FINAL FIVE MINUTES to talk about THE LARGEST POLITICAL PROTEST IN AMERICAN HISTORY. 🤨 #MtP

  34. #MeetTheRepublicans waited until THE FINAL FIVE MINUTES to talk about THE LARGEST POLITICAL PROTEST IN AMERICAN HISTORY. 🤨 #MtP

  35. @pukite.com

    She knows what she's doing. "Tens of thousands" in NYC isn't the same as suggesting "tens of thousands nationwide." 😐

    I had to shut off #MtP after Lankford called #Democrats "totally unreasonable" for "opposing allowing #ICE agents to police Polling places. No one thinks illegal aliens should be allowed to vote."

    I screamed at my TV and shut it off. LORD I DESPISE THESE PEOPLE! 🤬

  36. @pukite.com

    She knows what she's doing. "Tens of thousands" in NYC isn't the same as suggesting "tens of thousands nationwide." 😐

    I had to shut off #MtP after Lankford called #Democrats "totally unreasonable" for "opposing allowing #ICE agents to police Polling places. No one thinks illegal aliens should be allowed to vote."

    I screamed at my TV and shut it off. LORD I DESPISE THESE PEOPLE! 🤬