#multi-token-prediction — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #multi-token-prediction, aggregated by home.social.
-
🔥 Gemma 4 riduce la latenza fino a 3x con i drafter Multi-Token: decodifica speculativa senza perdita di qualità
https://gomoot.com/gemma-4-accelera-linferenza-grazie-ai-drafter-multi-token/ -
Accelerating Gemma 4: faster inference with multi-token prediction drafters
https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/
#HackerNews #Gemma4 #Accelerated #Inference #MultiTokenPrediction #AI
-
Researchers have discovered a clever trick: by embedding a mask token directly into the weight matrix, they can bypass the costly embedding lookup and generate up to three times faster token streams. The method works with parallel computation and speculative decoding, promising big gains for open‑source LLMs. Read on to see how ConfAdapt powers this speed‑up. #LLMinference #SpeculativeDecoding #MultiTokenPrediction #ModelAcceleration
🔗 https://aidailypost.com/news/researchers-embed-mask-token-llm-weights-achieve-3-faster-inference
-
Alibaba's new Qwen 3.5 397B-A17 outperforms even larger rivals by using multi-token prediction and a sparse mixture-of-experts architecture. It cuts inference cost while keeping top-tier performance, hinting at a new era for multimodal AI. Curious how 397 billion parameters can be cheaper? Read the full story. #Qwen3_5 #AlibabaAI #MixtureOfExperts #MultiTokenPrediction
🔗 https://aidailypost.com/news/alibabas-qwen-35-397b-a17-beats-larger-model-via-multitoken
-
Apple przyspiesza działanie modeli AI nawet 5 razy
Apple opublikowało badania opisujące nową technikę, która pozwala modelom językowym (LLM) generować odpowiedzi nawet pięć razy szybciej, bez utraty jakości.
Tradycyjnie modele LLM tworzą tekst token po tokenie (autoregresja), co spowalnia proces. Apple odkryło, że modele – mimo trenowania na przewidywanie jednego tokena – mają wiedzę o kilku kolejnych. Na tej podstawie powstał framework Multi-Token Prediction (MTP), w którym model przewiduje naraz kilka tokenów.
Badacze wprowadzili specjalne tokeny maskujące w treści promptów (np. „Kot jest ”), które model wypełnia w jednym kroku („bardzo puszysty”). Jeśli przewidywanie nie jest zgodne z klasycznym trybem, system wraca do standardowej metody. Dzięki temu zachowana jest wysoka dokładność.
Testy z modelem open-source Tulu3-8B pokazały:
- 2–3 razy szybsze działanie w typowych zadaniach (Q&A, czat)
- do 5 razy szybsze w przewidywalnych domenach, takich jak programowanie i matematyka
- brak utraty jakości dzięki technice gated LoRA adaptation
Pełny artykuł naukowy dostępny jest na stronach arXiv.
#aiApple #Apple #AppleIntelligence #badaniaApple #gatedLoRAAdaptation #generowanieTekstu #LLM #modeleJęzykowe #MTP #MultiTokenPrediction #optymalizacjaAI #przyspieszenieAI #sztucznaInteligencja #szybkieAI #Tulu38B
-
Explore the sophisticated mechanisms driving multi-token prediction. This section rigorously explains its edge via information-theoretic mutual information https://hackernoon.com/decoding-the-magic-multi-token-predictions-information-theoretic-edge-and-beyond #multitokenprediction
-
Discover how multi-token prediction improves LLM algorithmic reasoning, potentially by learning to allocate computational resources more efficiently https://hackernoon.com/multi-token-prediction-mastering-algorithmic-reasoning-with-enhanced-resource-use #multitokenprediction
-
This figure illustrates the profound impact of training scale on multi-token prediction models' performance on GSM8K, highlighting critical data efficiency https://hackernoon.com/strategic-llm-training-multi-token-predictions-data-efficiency-in-mathematical-reasoning #multitokenprediction
-
Explore Table S5 revealing multi-token prediction's remarkable training efficiency across LLM sizes (0.3B-13B) https://hackernoon.com/unleashing-llm-training-efficiency-multi-token-predictions-near-zero-overhead #multitokenprediction
-
We conclude our work on multi-token prediction as a superior method for training LLMs, delivering enhanced performance for generative/reasoning tasks https://hackernoon.com/unlocking-generative-power-multi-token-prediction-for-next-gen-llms #multitokenprediction
-
Explore the landscape of language modeling losses, multi-token prediction, and self-speculative decoding. highlights. https://hackernoon.com/defining-the-frontier-multi-token-predictions-place-in-llm-evolution #multitokenprediction
-
Dive into the design space beyond our core multi-token prediction architecture, comparing approaches like replicated unembeddings and linear heads https://hackernoon.com/exploring-alternative-architectures-for-multi-token-llm-prediction #multitokenprediction
-
Dive into the core reasons behind multi-token prediction's superior LLM performance, exploring how it mitigates distributional discrepancy https://hackernoon.com/unraveling-multi-token-prediction-bridging-training-inference-gaps-with-lookahead #multitokenprediction
-
Explore how multi-token prediction fundamentally alters LLM capabilities, dramatically improving induction and algorithmic reasoning https://hackernoon.com/unveiling-llm-intelligence-multi-token-prediction-drives-qualitative-reasoning-shifts #multitokenprediction
-
Witness multi-token prediction's transformative power across seven large-scale experiments: unlocking exponential gains with model size, 3x faster inference https://hackernoon.com/unrivaled-llm-efficacy-multi-token-prediction-revolutionizes-performance-across-domains #multitokenprediction