Search
1000 results for “Benja”
-
Excess CO2 is #JunkFood for plants,
-
Excess CO2 is #JunkFood for plants,
-
Excess CO2 is #JunkFood for plants,
-
Excess CO2 is #JunkFood for plants,
-
Excess CO2 is #JunkFood for plants,
-
🎧 Everyone in This Bank is a Thief by Benjamin Stevenson
#BenjaminStevenson #BartonWelch #HarperAudio #LoveAudiobooks @4saintjude #BookReview #4.5Hearts #AudioBookReview #Australia #PrivateInvestigator
https://booksofmyheart.net/2026/05/01/%f0%9f%8e%a7-everyone-in-this-bank-is-a-thief-by-benjamin-stevenson/?fsp_sid=12513 -
Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.
Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.
-
Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.
Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.
-
Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.
Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.
-
Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.
Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.
-
Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design.
Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning result: a separately parameterized head for meta-level work outperforms one model doing both.
-
Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.
The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.
-
Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.
The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.
-
Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.
The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.
-
Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.
The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.
-
Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.
The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.
-
MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.
The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.
-
MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.
The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.
-
MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.
The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.
-
MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.
The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.
-
MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains.
The decomposition is the contribution. Once prompts split into task-agnostic (system) vs task-specific (user), meta-learning follows. System-prompt optimization transfers; user-prompt doesn't.
-
ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.
The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?
https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social
-
ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.
The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?
https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social
-
ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.
The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?
https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social
-
ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.
The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?
https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social
-
ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math.
The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs the same architecture a year later and lands the same direction of result. Open question: does decoupling survive a FLOPs-matched comparison?
https://benjaminhan.net/posts/20260430-rema-meta-think/?utm_source=mastodon&utm_medium=social
-
MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.
Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.
-
MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.
Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.
-
MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.
Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.
-
MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts.
Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from workflow prompts. Prompt tuning dominates — contradicts the topology-first thesis of ADAS and AFlow.