home.social

Search

1000 results for “Benja”

  1. Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

    Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #CMU

  2. Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

    Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #CMU

  3. Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

    Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #CMU

  4. Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

    Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #CMU

  5. Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

    Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #CMU

  6. Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

    The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Cambridge #ICML

  7. Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

    The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Cambridge #ICML

  8. Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

    The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Cambridge #ICML

  9. Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

    The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Cambridge #ICML

  10. Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

    The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Cambridge #ICML

  11. MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

    The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

    benjaminhan.net/posts/20260430

    #LLMs #AI #PromptEngineering #KAIST #NeurIPS

  12. MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

    The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

    benjaminhan.net/posts/20260430

    #LLMs #AI #PromptEngineering #KAIST #NeurIPS

  13. MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

    The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

    benjaminhan.net/posts/20260430

    #LLMs #AI #PromptEngineering #KAIST #NeurIPS

  14. MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

    The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

    benjaminhan.net/posts/20260430

    #LLMs #AI #PromptEngineering #KAIST #NeurIPS

  15. MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

    The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

    benjaminhan.net/posts/20260430

    #LLMs #AI #PromptEngineering #KAIST #NeurIPS

  16. ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

    The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Reasoning

  17. ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

    The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Reasoning

  18. ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

    The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Reasoning

  19. ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

    The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Reasoning

  20. ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

    The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #Reasoning

  21. MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.

    Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #PromptEngineering #Google #ICLR

  22. MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.

    Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #PromptEngineering #Google #ICLR

  23. MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.

    Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #PromptEngineering #Google #ICLR

  24. MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.

    Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.

    benjaminhan.net/posts/20260430

    #LLMs #AI #AgenticSystems #PromptEngineering #Google #ICLR