#iclr — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #iclr, aggregated by home.social.

Benjamin Han @[email protected] · 2026-05-12 · 18:46 UTC

SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.
https://benjaminhan.net/posts/20260512-score/?utm_source=mastodon&utm_medium=social
#Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

#paper #llms #rl #metacognition #reasoning #iclr
Benjamin Han @[email protected] · 2026-05-12 · 18:46 UTC

SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.
https://benjaminhan.net/posts/20260512-score/?utm_source=mastodon&utm_medium=social
#Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

#paper #llms #rl #metacognition #reasoning #iclr
Benjamin Han @[email protected] · 2026-05-12 · 18:46 UTC

SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.
https://benjaminhan.net/posts/20260512-score/?utm_source=mastodon&utm_medium=social
#Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

#paper #llms #rl #metacognition #reasoning #iclr
Benjamin Han @[email protected] · 2026-05-12 · 18:46 UTC

SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.
https://benjaminhan.net/posts/20260512-score/?utm_source=mastodon&utm_medium=social
#Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

#ai #iclr #reasoning #metacognition #rl #llms
Benjamin Han @[email protected] · 2026-05-12 · 18:46 UTC

SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.
https://benjaminhan.net/posts/20260512-score/?utm_source=mastodon&utm_medium=social
#Paper #LLMs #RL #Metacognition #Reasoning #ICLR #AI

#paper #llms #rl #metacognition #reasoning #iclr
Benjamin Han @[email protected] · 2026-05-12 · 18:45 UTC

Let's Verify Step by Step compares process and outcome supervision on MATH. The process-reward model reaches 78.2% best-of-1860 vs 72.4% for outcome. But that gap narrows fast at small N, where most deployments actually live.
https://benjaminhan.net/posts/20260512-lets-verify-step-by-step/?utm_source=mastodon&utm_medium=social
#Paper #LLMs #Reasoning #Mathematics #ICLR #OpenAI #AI

#paper #llms #reasoning #mathematics #iclr #openai
Benjamin Han @[email protected] · 2026-05-05 · 21:51 UTC

Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.
https://benjaminhan.net/posts/20260505-conformal-language-modeling/?utm_source=mastodon&utm_medium=social
#ConformalPrediction #LLMs #Hallucination #ICLR #AI

#conformalprediction #llms #hallucination #iclr #ai
Benjamin Han @[email protected] · 2026-05-01 · 01:14 UTC

DSPy turns LM pipelines into typed-module graphs and compiles them end-to-end against a single metric, bootstrapping its own few-shot demonstrations.
The programming-model layer is the real contribution, not any specific teleprompter. Once pipelines are typed graphs, pipeline-level search (MASS, MIPRO) becomes possible in a way it wasn't with string-template prompts.
https://benjaminhan.net/posts/20260430-dspy/?utm_source=mastodon&utm_medium=social
#LLMs #AI #PromptEngineering #NLP #Stanford #ICLR

#llms #ai #promptengineering #nlp #stanford #iclr
Benjamin Han @[email protected] · 2026-05-01 · 01:05 UTC

SelfReflect measures whether an LLM's text summary of its uncertainty matches its actual answer distribution. Across 20 modern models: it doesn't, unless the model sees samples of its own answers first.
The negative result does more work than the metric itself. Fits a growing line where LLM self-reports shouldn't be trusted as introspection. Practical workaround isn't cheap: N forward passes to sample, then a summarize pass.
https://benjaminhan.net/posts/20260430-selfreflect-internal-distribution/?utm_source=mastodon&utm_medium=social
#LLMs #AI #Evaluation #Apple #ICLR

#llms #ai #evaluation #apple #iclr
UKP Lab @[email protected] · 2026-02-04 · 11:40 UTC

Big congratulations to all authors! 🚀
#ICLR2026 #MachineLearning #AIResearch #RepresentationLearning #InformationRetrieval #DenseRetrieval #SelfSupervisedLearning #LanguageModels #NLP #UKPLab #ICLR

#iclr2026 #machinelearning #airesearch #representationlearning #informationretrieval #denseretrieval
UKP Lab @[email protected] · 2026-02-04 · 11:40 UTC

Big congratulations to all authors! 🚀
#ICLR2026 #MachineLearning #AIResearch #RepresentationLearning #InformationRetrieval #DenseRetrieval #SelfSupervisedLearning #LanguageModels #NLP #UKPLab #ICLR

#iclr2026 #machinelearning #airesearch #representationlearning #informationretrieval #denseretrieval
UKP Lab @[email protected] · 2026-02-04 · 11:40 UTC

Big congratulations to all authors! 🚀
#ICLR2026 #MachineLearning #AIResearch #RepresentationLearning #InformationRetrieval #DenseRetrieval #SelfSupervisedLearning #LanguageModels #NLP #UKPLab #ICLR

#iclr #ukplab #nlp #languagemodels #selfsupervisedlearning #denseretrieval
UKP Lab @[email protected] · 2026-02-04 · 11:40 UTC

Big congratulations to all authors! 🚀
#ICLR2026 #MachineLearning #AIResearch #RepresentationLearning #InformationRetrieval #DenseRetrieval #SelfSupervisedLearning #LanguageModels #NLP #UKPLab #ICLR

#iclr2026 #machinelearning #airesearch #representationlearning #informationretrieval #denseretrieval
tech news ᳇ eicker.news @[email protected] · 2025-11-30 · 14:44 UTC

A major #AIconference, the International Conference on Learning Representations (#ICLR), discovered that 21% of #peerreviews were fully #AIgenerated. #Researchers raised concerns about AI-generated #reviews, citing issues like #hallucinatedcitations and #vaguefeedback. Organisers will now use automated tools to assess submissions and reviews for AI use. https://www.nature.com/articles/d41586-025-03506-6?eicker.news #tech #media #news

#aiconference #iclr #peerreviews #aigenerated #researchers #reviews
Liz Wood @[email protected] · 2023-01-09 · 13:54 UTC

Got invited to speak at an #iclr workshop in May and I'm over the moon.
#ml4bio #mlbio #ml4drugdesign #ml4celldesign

#iclr #ml4bio #mlbio #ml4drugdesign #ml4celldesign
Liz Wood @[email protected] · 2022-11-07 · 09:07 UTC

Looking to follow (and be followed!) #mlbio #bioml #biotech #bayes #bayesian #generativemodels #startup #startups #medicine #biology #representationlearning #techbio #immunology #neurips #icml #iclr #cambridge #nyc

#mlbio #bioml #biotech #bayes #bayesian #generativemodels