#contextrot — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #contextrot, aggregated by home.social.
-
TIL You can respawn context point and parallelize #vibecoding development to avoid #contextrot or explore alternative pathways with /rewind
I think I just leveled up my vibecoding, can't wait to try it!
-
Anthropic Challenges OpenAI with 1M Token Claude Sonnet 4 Upgrade, But Is Bigger Always Better?
#AI #LLM #Anthropic #Claude4Sonnet #Claude4 #OpenAI #GPT5 #ContextWindow #ContextRot
-
Anthropic Challenges OpenAI with 1M Token Claude Sonnet 4 Upgrade, But Is Bigger Always Better?
#AI #LLM #Anthropic #Claude4Sonnet #Claude4 #OpenAI #GPT5 #ContextWindow #ContextRot
-
Anthropic Challenges OpenAI with 1M Token Claude Sonnet 4 Upgrade, But Is Bigger Always Better?
#AI #LLM #Anthropic #Claude4Sonnet #Claude4 #OpenAI #GPT5 #ContextWindow #ContextRot
-
Anthropic Challenges OpenAI with 1M Token Claude Sonnet 4 Upgrade, But Is Bigger Always Better?
#AI #LLM #Anthropic #Claude4Sonnet #Claude4 #OpenAI #GPT5 #ContextWindow #ContextRot
-
‘Context Rot’: New Study Reveals Why Bigger Context Windows Don't Magically Improve LLM Performance
#AI #LLM #ContextRot #MachineLearning #AIResearch #ContextWindow #Gemini25Pro #GoogleGemini
-
"Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.
In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.
Recent developments in LLMs show a trend toward longer context windows, with the input token count of the latest models reaching the millions. Because these models achieve near-perfect scores on widely adopted benchmarks like Needle in a Haystack (NIAH) [1], it’s often assumed that their performance is uniform across long-context tasks.
However, NIAH is fundamentally a simple retrieval task, in which a known sentence (the “needle”) is placed in a long document of unrelated text (the “haystack”), and the model is prompted to retrieve it. While scalable, this benchmark typically assesses direct lexical matching, which may not be representative of flexible, semantically oriented tasks.
We extend the standard NIAH task, to investigate model behavior in previously underexplored settings. We examine the effects of needles with semantic, rather than direct lexical matches, as well as the effects of introducing variations to the haystack content.
(...)
We demonstrate that even under these minimal conditions, model performance degrades as input length increases, often in surprising and non-uniform ways. Real-world applications typically involve much greater complexity, implying that the influence of input length may be even more pronounced in practice." -
Context Rot: How increasing input tokens impacts LLM performance
https://research.trychroma.com/context-rot
#HackerNews #ContextRot #LLMperformance #NLP #research #AIimpact #inputtokens