home.social

#agenticsystems — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #agenticsystems, aggregated by home.social.

  1. Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

    benjaminhan.net/posts/20260513

    #AI #AgenticSystems #LLMs

  2. The Inference Shift: Ben Thompson splits "inference" into two workloads. Answer inference (human waiting) stays on premium GPUs; agentic inference (no human waiting) migrates to commodity memory hierarchy. Familiar shape: the 70s batch-off-mainframes migration may rerun on today's GPU clusters.

    benjaminhan.net/posts/20260511

    #AI #AgenticSystems #Inference #tech

  3. Whoa, hold onto your propeller hats, folks! 🤓 We've got a #GitHub project that's apparently a self-modifying, open-sourced "agentic system" (whatever that means) living in consumer hardware. Because clearly what we all need is more inscrutable tech jargon disguised as innovation! 🤖🔧
    github.com/ninjahawk/hollow-ag #Innovation #SelfModifyingTech #AgenticSystems #ConsumerHardware #TechJargon #HackerNews #ngated

  4. Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

    Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #CMU

  5. Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

    The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Cambridge #ICML

  6. ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

    The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Reasoning

  7. MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.

    Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #PromptEngineering #Google #ICLR