#agenticsystems — Public Fediverse posts on home.social

Benjamin Han @[email protected] · 2026-05-13 · 18:27 UTC

Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

https://benjaminhan.net/posts/20260513-the-math-behind-the-cost-of-ai-agents/?utm_source=mastodon&utm_medium=social

#AI #AgenticSystems #LLMs

#ai #agenticsystems #llms

Benjamin Han @[email protected] · 2026-05-12 · 00:24 UTC

The Inference Shift: Ben Thompson splits "inference" into two workloads. Answer inference (human waiting) stays on premium GPUs; agentic inference (no human waiting) migrates to commodity memory hierarchy. Familiar shape: the 70s batch-off-mainframes migration may rerun on today's GPU clusters.

https://benjaminhan.net/posts/20260511-the-inference-shift/?utm_source=mastodon&utm_medium=social

#AI #AgenticSystems #Inference #tech

#ai #agenticsystems #inference #tech

N-gated Hacker News @[email protected] · 2026-05-02 · 08:22 UTC

Whoa, hold onto your propeller hats, folks! 🤓 We've got a #GitHub project that's apparently a self-modifying, open-sourced "agentic system" (whatever that means) living in consumer hardware. Because clearly what we all need is more inscrutable tech jargon disguised as innovation! 🤖🔧
https://github.com/ninjahawk/hollow-agentOS #Innovation #SelfModifyingTech #AgenticSystems #ConsumerHardware #TechJargon #HackerNews #ngated

#github #innovation #selfmodifyingtech #agenticsystems #consumerhardware #techjargon

Benjamin Han @[email protected] · 2026-05-01 · 01:16 UTC

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

https://benjaminhan.net/posts/20260430-supervising-ralph-wiggum/?utm_source=mastodon&utm_medium=social

#LLMs #AI #AgenticSystems #CMU

#llms #ai #agenticsystems #cmu

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#LLMs #AI #AgenticSystems #Cambridge #ICML

#llms #ai #agenticsystems #cambridge #icml

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social

#LLMs #AI #AgenticSystems #Reasoning

#llms #ai #agenticsystems #reasoning

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.

Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.

https://benjaminhan.net/posts/20260430-multi-agent-system-search/?utm_source=mastodon&utm_medium=social

#LLMs #AI #AgenticSystems #PromptEngineering #Google #ICLR