#mtp — Public Fediverse posts on home.social

Thomas @[email protected] · 2026-05-26 · 14:25 UTC

New week, more slides: Run LLMs Locally

Now including wllama to run GGUF models inside your browser!

wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
It has no 4 GB limitation and is faster than Transformers.js.

I also added translations using the HY-MT model from Tencent.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

#ai #llm #llamacpp #wllama #stablediffusion #qwen3

Thomas @[email protected] · 2026-05-26 · 14:25 UTC

New week, more slides: Run LLMs Locally

Now including wllama to run GGUF models inside your browser!

wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
It has no 4 GB limitation and is faster than Transformers.js.

I also added translations using the HY-MT model from Tencent.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

#ai #llm #llamacpp #wllama #stablediffusion #qwen3

Thomas @[email protected] · 2026-05-26 · 14:25 UTC

New week, more slides: Run LLMs Locally

Now including wllama to run GGUF models inside your browser!

wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
It has no 4 GB limitation and is faster than Transformers.js.

I also added translations using the HY-MT model from Tencent.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

#ai #llm #llamacpp #wllama #stablediffusion #qwen3

Thomas @[email protected] · 2026-05-26 · 14:25 UTC

New week, more slides: Run LLMs Locally

Now including wllama to run GGUF models inside your browser!

wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
It has no 4 GB limitation and is faster than Transformers.js.

I also added translations using the HY-MT model from Tencent.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

#webassembly #mtp #opencode #webgpu #gemma4 #localai

Thomas @[email protected] · 2026-05-26 · 14:25 UTC

New week, more slides: Run LLMs Locally

Now including wllama to run GGUF models inside your browser!

wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
It has no 4 GB limitation and is faster than Transformers.js.

I also added translations using the HY-MT model from Tencent.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

#ai #llm #llamacpp #wllama #stablediffusion #qwen3

Arint - SEO+KI @[email protected] · 2026-05-25 · 10:02 UTC

RT @TeksEdge: 🚀 Neue MTP-Unterstützung für Strix Halo veröffentlicht!