#sparse-attention — Public Fediverse posts on home.social

deepseek @[email protected] · 2026-06-24 · 02:07 UTC

MiniMax M3 Explained: The Sparse Attention Breakthrough This article was originally published on GetYourDozAi . Key Takeaways MiniMax M3 — the first open-weight model to combine frontier coding, ...

#minimax #ai #machinelearning #sparseattention

Origin | Interest | Match

#minimax #ai #machinelearning #sparseattention

H@R0👨🏻‍💻 @[email protected] · 2026-06-05 · 00:50 UTC

因為數學上 #LLM 的context是無限的，最近研究的 #SparseAttention 解除了非線性attention的限制，所以今年開始llm應用的context可以真正達到無限長度，目前好像是兩層的attention，我估計27年或28年的attention可能會有三層甚至更多，至少會有一層專門做RAG，甚至乎可以在attention裏直接調用其他LLM

#llm #sparseattention

H@R0👨🏻‍💻 @[email protected] · 2026-06-01 · 10:13 UTC

還沒開始研究，但我估計27年發佈的SOTA模型都會有兩層的Attention，以後用RAG做的應用都會放在第二層裏面，能夠不依靠vector database都可以處理極大量的語料且不會影響性能，舉個例子，現在做智能客服一般都需要RAG搭建智識庫，然後不斷優化召回率和排序，27年的智能體應該可以LLM本身把整個知識庫加載進去attention，這裏還需要做prompt processing，估計26年下半年會有新的算法做緩存，27年之後應該可以像lora一樣做成外掛的掛上去llm

https://huggingface.co/blog/AtlasCloud-AI/minimax-goes-sparse

#MiniMaxM3 #SparseAttention

#minimaxm3 #sparseattention

Arint - SEO+KI @[email protected] · 2026-05-27 · 04:01 UTC

RT @kimmonismus: MiniMax hat gerade ihre Sparse-Attention-Architektur für M3 angekündigt. Die Benchmarks zeigen eine 9,7-fache Beschleunigung des Prefillings und eine 15,6-fache Beschleunigung des Decodings bei 1M Token im Vergleich zu M2. MiniMax kehrte bei M2 bewusst zur Full-Attention zurück, da effiziente Attention noch nicht produktionsreif war. Der Lead-Entwickler des Pretrainings schrieb dazu im März einen ganzen Blogbeitrag. Jetzt zeigen sie einen neuen zweistufigen Ansatz: eine leichte Index-Branch zur Blockauswahl, gefolgt von Sparse-Attention nur auf relevante KV-Blöcke. Sehr interessant. Und ehrlich gesagt freue ich mich immer, wenn Open-Source-Projekte neue Erfolge feiern. MiniMax (official) (@MiniMaxAI) #MSA #OpenSource #M3 🫣😎 — https://nitter.net/MiniMaxAI/status/2059286515155599595#m

mehr auf Arint.info

#AI #DeepLearning #M3 #MiniMax #OpenSource #SparseAttention #arint_info

https://x.com/kimmonismus/status/2059302121489486335#m

#msa #opensource #m3 #ai #deeplearning #minimax

Arint - SEO+KI @[email protected] · 2026-05-27 · 04:01 UTC

RT @kimmonismus: MiniMax hat gerade ihre Sparse-Attention-Architektur für M3 angekündigt. Die Benchmarks zeigen eine 9,7-fache Beschleunigung des Prefillings und eine 15,6-fache Beschleunigung des Decodings bei 1M Token im Vergleich zu M2. MiniMax kehrte bei M2 bewusst zur Full-Attention zurück, da effiziente Attention noch nicht produktionsreif war. Der Lead-Entwickler des Pretrainings schrieb dazu im März einen ganzen Blogbeitrag. Jetzt zeigen sie einen neuen zweistufigen Ansatz: eine leichte Index-Branch zur Blockauswahl, gefolgt von Sparse-Attention nur auf relevante KV-Blöcke. Sehr interessant. Und ehrlich gesagt freue ich mich immer, wenn Open-Source-Projekte neue Erfolge feiern. MiniMax (official) (@MiniMaxAI) #MSA #OpenSource #M3 🫣😎 — https://nitter.net/MiniMaxAI/status/2059286515155599595#m

mehr auf Arint.info

#AI #DeepLearning #M3 #MiniMax #OpenSource #SparseAttention #arint_info

https://x.com/kimmonismus/status/2059302121489486335#m

#msa #opensource #m3 #ai #deeplearning #minimax

deepseek @[email protected] · 2026-02-25 · 00:02 UTC

Understand DeepSeek V3.2: Pushing the Frontier of Open LLMs Recently, I joined the MLSys 2026 NVIDIA competition track! So I’m trying to understand DeepSeek V3.2, sparse attention, and learn GPU...

#gpu #sparse-attention #llm #machine-learning #deepseek

Origin | Interest | Match

#gpu #sparseattention #llm #machinelearning #deepseek

AIagent.at 🤖 AI News @[email protected] · 2026-02-12 · 04:08 UTC

#ZAI: #GLM5, a new large language model, is designed for #complexsystemsengineering and long-horizon agentic tasks. It boasts 744 billion parameters and integrates #DeepSeek #SparseAttention for improved efficiency. GLM-5 outperforms previous models on various benchmarks, including #reasoning, #coding, and #agentictasks, and is open-sourced for wider accessibility. https://z.ai/blog/glm-5?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI

#zai #glm5 #complexsystemsengineering #deepseek #sparseattention #reasoning

AIagent.at 🤖 AI News @[email protected] · 2026-02-12 · 04:08 UTC

#ZAI: #GLM5, a new large language model, is designed for #complexsystemsengineering and long-horizon agentic tasks. It boasts 744 billion parameters and integrates #DeepSeek #SparseAttention for improved efficiency. GLM-5 outperforms previous models on various benchmarks, including #reasoning, #coding, and #agentictasks, and is open-sourced for wider accessibility. https://z.ai/blog/glm-5?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI

#zai #glm5 #complexsystemsengineering #deepseek #sparseattention #reasoning

TechLİfe @techlife_blog · 2025-12-03 · 08:25 UTC

DeepSeek V3.2 AI Model Matches OpenAI's GPT-5 with Lower Training Costs

https://techlife.blog/posts/deepseek-v32-ai-model-matches-openai-gpt-5/

#DeepSeek #AImodel #GPT5 #SparseAttention

#deepseek #aimodel #gpt5 #sparseattention

AI Daily Post @[email protected] · 2025-12-03 · 00:40 UTC

DeepSeek V3.2 pushes open‑source LLMs forward with strong synthesis, ready‑to‑use formatting cues and geographic logic. Its sparse attention unlocks long‑context and tool‑use reasoning, making it a versatile choice for developers. Dive into the details on Analytics Vidhya. #DeepSeekV32 #OpenSourceLLM #SparseAttention #LongContext

🔗 https://aidailypost.com/news/deepseek-v32-shows-strong-synthesis-readytouse-formatting-opensource

#deepseekv32 #opensourcellm #sparseattention #longcontext