home.social

Search

1000 results for “Benja”

  1. Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

    benjaminhan.net/posts/20260513

    #AI #AgenticSystems #LLMs

  2. Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

    benjaminhan.net/posts/20260513

    #AI #AgenticSystems #LLMs

  3. Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

    benjaminhan.net/posts/20260513

    #AI #AgenticSystems #LLMs

  4. Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

    benjaminhan.net/posts/20260513

    #AI #AgenticSystems #LLMs

  5. Agent token cost grows quadratically in turns without caching, roughly linearly with caching. A new post fits those curves to SWE-bench traces on three models. Cross-model finding shows something interesting: Gemini 3 Flash takes 2× as many turns as GPT-5.2 or Opus 4.6, so its leaner per-turn verbosity (~300 tokens vs ~1,000) still burns more total tokens.

    benjaminhan.net/posts/20260513

    #AI #AgenticSystems #LLMs

  6. Benjamin Broersma (@forumstandaardisatie), member of the Dutch Internet Standards Platform, will speak at the 5th #NISDUC Conference in Brussels on 19–20 May 2026.

    As part of a breakout session on #cybersecurity tools on 19 May, he will present about Internet.nl and the organisations involved, explain how the tool provides insight into the compliance of websites, email, and internet connections with modern #InternetStandards, and help you get started.

    🧵 1/3

  7. Benjamin Broersma (@forumstandaardisatie), member of the Dutch Internet Standards Platform, will speak at the 5th #NISDUC Conference in Brussels on 19–20 May 2026.

    As part of a breakout session on #cybersecurity tools on 19 May, he will present about Internet.nl and the organisations involved, explain how the tool provides insight into the compliance of websites, email, and internet connections with modern #InternetStandards, and help you get started.

    🧵 1/3

  8. Benjamin Broersma (@forumstandaardisatie), member of the Dutch Internet Standards Platform, will speak at the 5th #NISDUC Conference in Brussels on 19–20 May 2026.

    As part of a breakout session on #cybersecurity tools on 19 May, he will present about Internet.nl and the organisations involved, explain how the tool provides insight into the compliance of websites, email, and internet connections with modern #InternetStandards, and help you get started.

    🧵 1/3

  9. Benjamin Broersma (@forumstandaardisatie), member of the Dutch Internet Standards Platform, will speak at the 5th #NISDUC Conference in Brussels on 19–20 May 2026.

    As part of a breakout session on #cybersecurity tools on 19 May, he will present about Internet.nl and the organisations involved, explain how the tool provides insight into the compliance of websites, email, and internet connections with modern #InternetStandards, and help you get started.

    🧵 1/3

  10. Benjamin Broersma (@forumstandaardisatie), member of the Dutch Internet Standards Platform, will speak at the 5th #NISDUC Conference in Brussels on 19–20 May 2026.

    As part of a breakout session on #cybersecurity tools on 19 May, he will present about Internet.nl and the organisations involved, explain how the tool provides insight into the compliance of websites, email, and internet connections with modern #InternetStandards, and help you get started.

    🧵 1/3

  11. Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?

    benjaminhan.net/posts/20260512

    #AI #HumanInTheLoop #Multimodal #HCI #FutureOfWork

  12. Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?

    benjaminhan.net/posts/20260512

    #AI #HumanInTheLoop #Multimodal #HCI #FutureOfWork

  13. Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?

    benjaminhan.net/posts/20260512

    #AI #HumanInTheLoop #Multimodal #HCI #FutureOfWork

  14. Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?

    benjaminhan.net/posts/20260512

    #AI #HumanInTheLoop #Multimodal #HCI #FutureOfWork

  15. Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?

    benjaminhan.net/posts/20260512

    #AI #HumanInTheLoop #Multimodal #HCI #FutureOfWork

  16. SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.

    benjaminhan.net/posts/20260512

    #Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

  17. SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.

    benjaminhan.net/posts/20260512

    #Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

  18. SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.

    benjaminhan.net/posts/20260512

    #Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

  19. SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.

    benjaminhan.net/posts/20260512

    #Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

  20. SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.

    benjaminhan.net/posts/20260512

    #Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

  21. Let's Verify Step by Step compares process and outcome supervision on MATH. The process-reward model reaches 78.2% best-of-1860 vs 72.4% for outcome. But that gap narrows fast at small N, where most deployments actually live.

    benjaminhan.net/posts/20260512

    #Paper #LLMs #Reasoning #Mathematics #ICLR #OpenAI #AI

  22. Benjamín Netanyahu y el uso expansivo del derecho internacional vinculado a Costa Rica

    Benjamín Netanyahu y el uso expansivo del derecho internacional vinculado a Costa Rica
    El artículo de opinión del académico Nicolás Boeglin (publicado el 04 y 05 de mayo en este diario) sobre una eventual visita del primer ministro Benjamín Netanyahu al traspaso de poderes en Costa Rica cons [...]

    #BenjaminNetanyahu #CorteInternacionalDeJusticia #DerechoInternacional #Israel #Opinión

    elmundo.cr/opinion/benjamin-ne

  23. Benjamín Netanyahu y el uso expansivo del derecho internacional vinculado a Costa Rica

    Benjamín Netanyahu y el uso expansivo del derecho internacional vinculado a Costa Rica
    El artículo de opinión del académico Nicolás Boeglin (publicado el 04 y 05 de mayo en este diario) sobre una eventual visita del primer ministro Benjamín Netanyahu al traspaso de poderes en Costa Rica cons [...]

    #BenjaminNetanyahu #CorteInternacionalDeJusticia #DerechoInternacional #Israel #Opinión

    elmundo.cr/opinion/benjamin-ne