#mixture-of-experts — Public Fediverse posts on home.social

Winbuzzer @[email protected] · 2026-04-27 · 13:59 UTC

https://winbuzzer.com/2026/04/27/deepseek-v4-open-weights-launch-xcxwbn/

DeepSeek V4 Ships 1M Context, Open-Weights

#AI #DeepSeekV4 #DeepSeek #OpenSourceAI #AIModels #MixtureOfExperts #ChinaAI #GenerativeAI #EnterpriseAI

#ai #deepseekv4 #deepseek #opensourceai #aimodels #mixtureofexperts

Winbuzzer @[email protected] · 2026-04-04 · 10:25 UTC

https://winbuzzer.com/2026/04/03/arcee-ai-399b-open-source-reasoning-model-apache-2-xcxwbn/

Arcee AI Launches 399B Top-Performing Open-Source Model at 96% Lower Cost

#AI #ArceeAI #OpenSourceAI #LLMs #AIModels #MixtureOfExperts

#ai #arceeai #opensourceai #llms #aimodels #mixtureofexperts

TechLİfe @[email protected] · 2026-04-03 · 11:57 UTC

Gemma 4: Google's Most Capable Open Models Are Here — and They Run on Your Laptop

https://techlife.blog/posts/gemma-4-google-open-models

#Gemma4 #GoogleAI #OpenSourceAI #LLM #OnDeviceAI #MixtureOfExperts

#gemma4 #googleai #opensourceai #llm #ondeviceai #mixtureofexperts

AI Daily Post @[email protected] · 2026-03-13 · 07:42 UTC

Nemotron 3 Super pushes the frontier with 40 M supervised & alignment samples, leveraging a Mamba‑Transformer backbone and Mixture‑of‑Experts scaling. The model shows stronger agent reasoning, RL‑based fine‑tuning, and tighter AI alignment. Dive into the details to see how this LLM reshapes open‑source AI. #Nemotron3 #MixtureOfExperts #AIAlignment #SupervisedFineTuning

🔗 https://aidailypost.com/news/nemotron-3-super-incorporates-40-million-supervised-alignment-samples

#nemotron3 #mixtureofexperts #aialignment #supervisedfinetuning

AI Daily Post @[email protected] · 2026-02-26 · 04:11 UTC

Alibaba just released the Qwen‑3.5‑Medium model as open‑source, delivering Sonnet 4.5‑level performance on a single GPU. It uses a Mixture‑of‑Experts architecture and a new “Thinking Mode” to boost AI inference efficiency while staying lightweight. Dive into the details and see how this could reshape open‑source LLM development. #Qwen3_5 #OpenSourceLLM #MixtureOfExperts #ModelEfficiency

🔗 https://aidailypost.com/news/alibaba-open-sources-qwen35-medium-models-sonnet-45-performance

#qwen3_5 #opensourcellm #mixtureofexperts #modelefficiency

AI Daily Post @[email protected] · 2026-02-20 · 06:12 UTC

NVIDIA’s new co‑design with Sarvam AI slashes time‑to‑first‑token to under a second for LLM inference. By marrying Mixture‑of‑Experts models with GPU acceleration, they boost throughput while trimming latency. This hardware‑software synergy could reshape how we deploy large language models at scale. Read more to see the numbers and tech behind the breakthrough. #NVIDIA #SarvamAI #MixtureOfExperts #TTFT

🔗 https://aidailypost.com/news/nvidia-co-design-boosts-sarvam-ai-inference-cuts-ttft-below-one-second

#nvidia #sarvamai #mixtureofexperts #ttft

AI Daily Post @[email protected] · 2026-02-18 · 19:14 UTC

Alibaba's new Qwen 3.5 397B-A17 outperforms even larger rivals by using multi-token prediction and a sparse mixture-of-experts architecture. It cuts inference cost while keeping top-tier performance, hinting at a new era for multimodal AI. Curious how 397 billion parameters can be cheaper? Read the full story. #Qwen3_5 #AlibabaAI #MixtureOfExperts #MultiTokenPrediction

🔗 https://aidailypost.com/news/alibabas-qwen-35-397b-a17-beats-larger-model-via-multitoken

#qwen3_5 #alibabaai #mixtureofexperts #multitokenprediction

Winbuzzer @[email protected] · 2026-02-14 · 15:44 UTC

https://winbuzzer.com/2026/02/13/minimax-m25-open-source-ai-model-claude-opus-cost-xcxwbn/

MiniMax M2.5: Open-Source AI "Matches" Claude Opus at 1/20th Cost

#AI #MiniMax #MiniMaxM25 #OpenSourceAI #ChinaAI #MixtureOfExperts #MachineLearning #AIModels #ReinforcementLearning

#ai #minimax #minimaxm25 #opensourceai #chinaai #mixtureofexperts

AI Daily Post @[email protected] · 2026-02-12 · 20:57 UTC

MiniMax's new M2.5 model slashes costs to 1/20 of Claude Opus while handling 30% of HQ tasks. Built on a Mixture‑of‑Experts sparse architecture, it delivers strong code‑generation and LLM performance—all open‑source. Discover how this AI agent could boost productivity in your projects. #MiniMaxM2_5 #MixtureOfExperts #OpenSourceAI #AIProductivity

🔗 https://aidailypost.com/news/minimaxs-m25-costs-120-claude-opus-covers-30-hq-tasks

#minimaxm2_5 #mixtureofexperts #opensourceai #aiproductivity

Winbuzzer @[email protected] · 2026-02-04 · 12:04 UTC

https://winbuzzer.com/2026/02/04/alibaba-qwen3-coder-next-open-source-sparse-moe-coding-model-xcxwbn/

Alibaba’s Qwen3-Coder-Next Activates Just 3B of 80B Parameters For Improved Efficiency

#AI #AICoding #Qwen3 #Qwen3CoderNext #Alibaba #MixtureOfExperts LargeLanguageModels #OpenSourceAI #Coding

#ai #aicoding #qwen3 #qwen3codernext #alibaba #mixtureofexperts

TECHi @[email protected] · 2025-12-04 · 17:06 UTC

Nvidia unveils an AI server running mixture-of-experts models up to 1,000x faster with 72 high-speed GPUs. As AI shifts to real-time, large-scale deployment, competitors like AMD and Chinese AI firms challenge its lead. Nvidia aims to stay ahead in hardware, scalability, and operational efficiency.

#Nvidia #AIHardware #MixtureOfExperts #GPUs #HighPerformanceComputing

Read Full Article:- https://www.techi.com/nvidia-boost-moonshot-ai-deepseek-performance/

#nvidia #aihardware #mixtureofexperts #gpus #highperformancecomputing

deepseek @[email protected] · 2025-12-01 · 12:35 UTC

DeepSeek-Math-V2: Open-Source AI Earns IMO Gold, Tops Putnam Exam Chinese startup DeepSeek has released DeepSeek-Math-V2, an open-source AI model that solved five of six 2024 IMO problems, earning ...

#ChinaRevolutionUpdate #GenAIPro #AI #mathematical #reasoning #DeepSeek-Math-V2 #International #Mathematical #Olympiad #mixture-of-experts #system

Origin | Interest | Match

#chinarevolutionupdate #genaipro #ai #mathematical #reasoning #deepseekmathv2

TechLİfe @[email protected] · 2025-11-17 · 18:31 UTC

Kimi K2: Open-Source Mixture-of-Experts AI Model Released

https://techlife.blog/posts/kimi-k2-open-source-moe-ai/

#LLM #OpenSource #MixtureofExperts #Kimi

#llm #opensource #mixtureofexperts #kimi

KINEWS24 @[email protected] · 2025-09-12 · 11:37 UTC

🔥 Alibaba Qwen3-Next: 10x effizienter, 90% weeniger Trainingskosten!

▶️ Entdecke Hybrid-MoE nun
▶️ Aktiviere 262K Kontext!
▶️ Starte SGLang Turbo nun

#ai #ki #artificialintelligence #qwen3next #alibaba #largelanguagemodels #mixtureofexperts #linearattention

🔥 Jetzt KLICKEN & KOMMENTIEREN! 💭

https://kinews24.de/qwen3-next-alibaba-ki-revolution-2025/

#ai #ki #artificialintelligence #qwen3next #alibaba #largelanguagemodels

Agnieszka Serafinowicz @[email protected] · 2025-07-31 · 09:00 UTC

Chiński Z.ai rzuca rękawicę gigantom. Nowy model AI GLM-4.5 ma być otwarty, tani i rekordowo wydajny

I lepszy od modelu DeepSeek. Startup Z.ai zaprezentował nowe rozwiązanie – GLM-4.5. Jest to model AI udostępniony w formule open source, który ma być jeszcze tańszy, wydajniejszy i „bystrzejszy” od swojego głośnego, chińskiego poprzednika.

Nowy model, GLM-4.5, wyróżnia się tak zwaną „agentową” architekturą, co oznacza, że potrafi automatycznie dzielić złożone zadania na mniejsze etapy, by wykonać je precyzyjniej. Co więcej, jest o połowę mniejszy od DeepSeeka i do działania potrzebuje podobno zaledwie ośmiu specjalistycznych chipów Nvidia H20, stworzonych na potrzeby chińskiego rynku z uwzględnieniem amerykańskich restrykcji eksportowych.

Nowy model GLM-4.5 to dziś najbardziej zaawansowana (publicznie znana) chińska konstrukcja oparta na architekturze MoE (Mixture of Experts), dostępna w dwóch wariantach: flagowym z 355 miliardami parametrów oraz lżejszej wersji Air ze 106 miliardami parametrów. Architektura „agentowa” pozwala mu na autonomiczne planowanie i wykonywanie złożonych, wieloetapowych zadań. Według testów producenta, model Z.ai osiągnął trzeci wynik na świecie w branżowych benchmarkach, plasując się jednocześnie na pierwszym miejscu wśród wszystkich modeli typu open-source. Co niezwykle istotne, jego wysoka wydajność została osiągnięta przy relatywnie niskich wymaganiach sprzętowych.

Z.ai pozycjonuje swój produkt jako „prawdziwie otwartą alternatywę” dla zamkniętych, autorskich systemów, które dominują na rynku. Model jest dostępny na otwartej licencji, co daje firmom większą kontrolę i transparentność. Jednak to właśnie koszty stanowią jego największą przewagę. Według oficjalnego cennika, koszt przetwarzania miliona tokenów (fragmentów słów) to zaledwie 11 centów za dane wejściowe i 28 centów za dane wyjściowe. Dla porównania, w przypadku konkurencyjnego modelu DeepSeek R1 koszt tokenów wyjściowych to 2,19 dolara, co pokazuje ogromny (wręcz niewiarygodny) skok w optymalizacji kosztowej.

Dynamiczny rozwój chińskich firm AI ma coraz większe znaczenie geopolityczne. Gdy na początku roku DeepSeek zaprezentował swoją wydajność, wywołało to chwilowe załamanie kursów akcji amerykańskich gigantów technologicznych. Sukcesy Z.ai, firmy założonej w 2019 roku, która zebrała już ponad 1,5 mld dolarów od inwestorów takich jak Alibaba i Tencent, również nie pozostały niezauważone. Firma została wymieniona przez OpenAI jako jeden z nielicznych na świecie konkurentów zdolnych do budowy rywalizujących modeli, a także trafiła do prestiżowego raportu Stanford University „AI Index Report 2025”. Jednocześnie ten szybki postęp wzbudził niepokój w Waszyngtonie – startup Z.ai został wpisany na listę podmiotów objętych amerykańskimi restrykcjami handlowymi.

Trening modelu DeepSeek nie kosztował 6 mln dolarów, lecz 1,3 miliarda dolarów – raport SemiAnalysis

#AI #chiny #DeepSeek #geopolityka #GLM45 #MixtureOfExperts #modelJęzykowy #modeleJęzykowe #news #openSource #sztucznaInteligencja #technologia #usa #ZAi #Zhipu

#ai #chiny #deepseek #geopolityka #glm45 #mixtureofexperts

michabbb @[email protected] · 2025-07-23 · 11:45 UTC

#Qwen3Coder: Most Agentic Code Model Released 🤖

🎯 480B-parameter #MixtureOfExperts #LLM with 35B active parameters achieving #SOTA performance in agentic #coding
📏 Native 256K context support, extendable to 1M
tokens with #YaRN for repo-scale operations

https://qwenlm.github.io/blog/qwen3-coder/

🧵👇#AI

#qwen3coder #mixtureofexperts #llm #sota #coding #yarn

Habr @[email protected] · 2025-07-01 · 09:12 UTC

MiniMax-M1: Разбираем архитектуру, ломающую законы масштабирования (и наш VRAM)

В мире LLM доминирует квадратичная сложность, ограничивающая контекст. Но MiniMax-M1 бросает вызов: миллион токенов, низкие затраты. Разбираем гибридную архитектуру с Lightning Attention, новый алгоритм CISPO и инженерные прорывы, делающие эту модель уникальной.

https://habr.com/ru/articles/923588/

#minimaxm1 #LLM_архитектура #Lightning_Attention #mixtureofexperts #масштабирование_LLM

#масштабирование_llm #mixtureofexperts #lightning_attention #llm_архитектура #minimaxm1

Winbuzzer @[email protected] · 2025-04-29 · 07:25 UTC

Alibaba Launches Open-Source Qwen3 AI Family with Hybrid Thinking Modes

#AI #GenAI #AIModels #Alibaba #Qwen3 #LLMs #OpenSourceAI #MixtureOfExperts #HybridThinking #TechNews #ChinaAI #China

https://winbuzzer.com/2025/04/29/alibaba-launches-open-source-qwen3-ai-family-with-hybrid-thinking-modes-xcxwbn/

#ai #genai #aimodels #alibaba #qwen3 #llms

Winbuzzer @[email protected] · 2025-02-25 · 20:31 UTC

Alibaba has introduced QwQ-Max-Preview, a new AI reasoning model designed to challenge OpenAI and DeepSeek #AI #Alibaba #QwQMaxPreview #QwenChat #GenAI #MixtureOfExperts #China

https://winbuzzer.com/2025/02/25/alibaba-unveils-qwq-max-preview-to-compete-with-openai-and-deepseek-xcxwbn/

#ai #alibaba #qwqmaxpreview #qwenchat #genai #mixtureofexperts

WetHat💦 @WetHat · 2025-02-13 · 08:09 UTC

DeepSeek R1: All you need to know 🐳

The article covers various aspects of the model, from its architecture to training methodologies and practical applications. The explanations are mostly clear and detailed, making complex concepts like Mixture of Experts (#MoE) and reinforcement learning easy to understand.

https://fireworks.ai/blog/deepseek-r1-deepdive

#DeepSeekR1 #AI #MachineLearning #ReasoningModel #ReinforcementLearning #DeepLearning #MixtureOfExperts

#moe #deepseekr1 #ai #machinelearning #reasoningmodel #reinforcementlearning

WetHat💦 @WetHat · 2025-02-06 · 20:03 UTC

Brief analysis of DeepSeek R1 and its implications for Generative AI:
➡️ DeepSeek R1 exhibits powerful reasoning behaviors, achieved through scalable Group Relative Policy Optimization (GRPO).
➡️Emergent self-reflection and Chain-of-Thought (CoT) patterns improve reasoning performance.
➡️Distillation of larger models into smaller, efficient ones demonstrates significant performance improvements.

https://arxiv.org/abs/2502.02523v2?form=MG0AV3

#DeepSeekR1 #GenerativeAI #MachineLearning #AI #MixtureOfExperts

#deepseekr1 #generativeai #machinelearning #ai #mixtureofexperts