Search

Excess CO2 is #JunkFood for plants,

#junkfood

Paul Wermer, CC BY-NC-SA 4.0 @[email protected] · 2026-05-01 · 18:02 UTC

Excess CO2 is #JunkFood for plants,

#junkfood

Paul Wermer, CC BY-NC-SA 4.0 @[email protected] · 2026-05-01 · 18:02 UTC

Excess CO2 is #JunkFood for plants,

#junkfood

Paul Wermer, CC BY-NC-SA 4.0 @[email protected] · 2026-05-01 · 18:02 UTC

Excess CO2 is #JunkFood for plants,

#junkfood

Paul Wermer, CC BY-NC-SA 4.0 @[email protected] · 2026-05-01 · 18:02 UTC

https://benjaminhan.net/posts/20260430-supervising-ralph-wiggum/?utm_source=mastodon&utm_medium=social

Excess CO2 is #JunkFood for plants,

#junkfood

Books of My Heart @[email protected] · 2026-05-01 · 10:01 UTC

🎧 Everyone in This Bank is a Thief by Benjamin Stevenson
#BenjaminStevenson #BartonWelch #HarperAudio‬‬ #LoveAudiobooks @4saintjude #BookReview #4.5Hearts #AudioBookReview #Australia #PrivateInvestigator
https://booksofmyheart.net/2026/05/01/%f0%9f%8e%a7-everyone-in-this-bank-is-a-thief-by-benjamin-stevenson/?fsp_sid=12513

#benjaminstevenson #bartonwelch #harperaudio #loveaudiobooks #bookreview #audiobookreview

Benjamin Han @[email protected] · 2026-05-01 · 01:16 UTC

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

https://benjaminhan.net/posts/20260430-supervising-ralph-wiggum/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cmu

Benjamin Han @[email protected] · 2026-05-01 · 01:16 UTC

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

https://benjaminhan.net/posts/20260430-supervising-ralph-wiggum/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cmu

Benjamin Han @[email protected] · 2026-05-01 · 01:16 UTC

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

https://benjaminhan.net/posts/20260430-supervising-ralph-wiggum/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cmu

Benjamin Han @[email protected] · 2026-05-01 · 01:16 UTC

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

https://benjaminhan.net/posts/20260430-supervising-ralph-wiggum/?utm_source=mastodon&utm_medium=social

#cmu #agenticsystems #ai #llms

Benjamin Han @[email protected] · 2026-05-01 · 01:16 UTC

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.

Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cmu

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cambridge #icml

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cambridge #icml

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cambridge #icml

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#icml #cambridge #agenticsystems #ai #llms

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-metaspo-system-prompt-meta-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #cambridge #icml

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

https://benjaminhan.net/posts/20260430-metaspo-system-prompt-meta-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #promptengineering #kaist #neurips

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

https://benjaminhan.net/posts/20260430-metaspo-system-prompt-meta-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #promptengineering #kaist #neurips

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

https://benjaminhan.net/posts/20260430-metaspo-system-prompt-meta-learning/?utm_source=mastodon&utm_medium=social

#llms #ai #promptengineering #kaist #neurips

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

https://benjaminhan.net/posts/20260430-metaspo-system-prompt-meta-learning/?utm_source=mastodon&utm_medium=social

#neurips #kaist #promptengineering #ai #llms

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.

The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.

https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social

#llms #ai #promptengineering #kaist #neurips

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #reasoning

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #reasoning

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social

#llms #ai #agenticsystems #reasoning

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?

https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social

#reasoning #agenticsystems #ai #llms

Benjamin Han @[email protected] · 2026-05-01 · 01:15 UTC

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.

The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?