“Benja” — Fediverse search results on home.social

Benjamin Han @[email protected] · 2026-05-06 · 00:32 UTC

How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

https://benjaminhan.net/posts/20260505-llm-uncertainty-survey/?utm_source=mastodon&utm_medium=social

#Hallucination #LLMs #Calibration #ConformalPrediction #AI

#hallucination #llms #calibration #conformalprediction #ai

Benjamin Han @[email protected] · 2026-05-06 · 00:32 UTC

How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

https://benjaminhan.net/posts/20260505-llm-uncertainty-survey/?utm_source=mastodon&utm_medium=social

#Hallucination #LLMs #Calibration #ConformalPrediction #AI

#ai #conformalprediction #calibration #llms #hallucination

Benjamin Han @[email protected] · 2026-05-06 · 00:32 UTC

How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

https://benjaminhan.net/posts/20260505-llm-uncertainty-survey/?utm_source=mastodon&utm_medium=social

#Hallucination #LLMs #Calibration #ConformalPrediction #AI

#hallucination #llms #calibration #conformalprediction #ai

El Mundo @[email protected] · 2026-05-05 · 23:38 UTC

Costa Rica / Israel: ¿Benjamín Netanyahu… asistiendo al próximo acto de traspaso de poderes en Costa Rica?

Costa Rica / Israel: ¿Benjamín Netanyahu… asistiendo al próximo acto de traspaso de poderes en Costa Rica?
El pasado 27 de abril, un titular de un artículo de La Nación se lee como sigue: “¿Vendrá el presidente o el primer ministro de Israel a la toma de posesión de L [...]

#BenjaminNetanyahu #Gaza #Israel #LauraFernández #Opinión #RodrigoChaves #TraspasoDePoderes

https://elmundo.cr/opinion/costa-rica-israel-benjamin-netanyahu-asistiendo-al-proximo-acto-de-traspaso-de-poderes-en-costa-rica/

#benjaminnetanyahu #gaza #israel #laurafernandez #opinion #rodrigochaves

Benjamin Han @[email protected] · 2026-05-05 · 21:55 UTC

Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

https://benjaminhan.net/posts/20260505-semantic-entropy/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #Nature #AI

#llms #calibration #hallucination #nature #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:55 UTC

Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

https://benjaminhan.net/posts/20260505-semantic-entropy/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #Nature #AI

#llms #calibration #hallucination #nature #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:55 UTC

Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

https://benjaminhan.net/posts/20260505-semantic-entropy/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #Nature #AI

#llms #calibration #hallucination #nature #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:55 UTC

Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

https://benjaminhan.net/posts/20260505-semantic-entropy/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #Nature #AI

#ai #nature #hallucination #calibration #llms

Benjamin Han @[email protected] · 2026-05-05 · 21:55 UTC

Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

https://benjaminhan.net/posts/20260505-semantic-entropy/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #Nature #AI

#llms #calibration #hallucination #nature #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

#llms #calibration #hallucination #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

#llms #calibration #hallucination #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

#llms #calibration #hallucination #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

#ai #icml #hallucination #calibration #llms

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

#llms #calibration #hallucination #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

#conformalprediction #calibration #hallucination #llms #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

#conformalprediction #calibration #hallucination #llms #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

#conformalprediction #calibration #hallucination #llms #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

#ai #icml #llms #hallucination #calibration #conformalprediction

Benjamin Han @[email protected] · 2026-05-05 · 21:54 UTC

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

#conformalprediction #calibration #hallucination #llms #icml #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:51 UTC

Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

https://benjaminhan.net/posts/20260505-conformal-language-modeling/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #LLMs #Hallucination #ICLR #AI

#conformalprediction #llms #hallucination #iclr #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:51 UTC

Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

https://benjaminhan.net/posts/20260505-conformal-language-modeling/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #LLMs #Hallucination #ICLR #AI

#conformalprediction #llms #hallucination #iclr #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:51 UTC

Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

https://benjaminhan.net/posts/20260505-conformal-language-modeling/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #LLMs #Hallucination #ICLR #AI

#conformalprediction #llms #hallucination #iclr #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:51 UTC

Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

https://benjaminhan.net/posts/20260505-conformal-language-modeling/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #LLMs #Hallucination #ICLR #AI

#ai #iclr #hallucination #llms #conformalprediction

Benjamin Han @[email protected] · 2026-05-05 · 21:51 UTC

Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

https://benjaminhan.net/posts/20260505-conformal-language-modeling/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #LLMs #Hallucination #ICLR #AI

#conformalprediction #llms #hallucination #iclr #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:49 UTC

A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

https://benjaminhan.net/posts/20260505-conformal-prediction-primer/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #UncertaintyQuantification #AI

#conformalprediction #calibration #uncertaintyquantification #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:49 UTC

A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

https://benjaminhan.net/posts/20260505-conformal-prediction-primer/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #UncertaintyQuantification #AI

#conformalprediction #calibration #uncertaintyquantification #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:49 UTC

A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

https://benjaminhan.net/posts/20260505-conformal-prediction-primer/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #UncertaintyQuantification #AI

#conformalprediction #calibration #uncertaintyquantification #ai

Benjamin Han @[email protected] · 2026-05-05 · 21:49 UTC

A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

https://benjaminhan.net/posts/20260505-conformal-prediction-primer/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #UncertaintyQuantification #AI

#ai #uncertaintyquantification #calibration #conformalprediction

Benjamin Han @[email protected] · 2026-05-05 · 21:49 UTC

A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

https://benjaminhan.net/posts/20260505-conformal-prediction-primer/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #UncertaintyQuantification #AI

#conformalprediction #calibration #uncertaintyquantification #ai

MokhtarStork @[email protected] · 2026-05-05 · 12:46 UTC

Benjamin Libet: If your brain initiates action before your consciousness of deciding, then what exactly is the 'you' that thinks it is choosing? https://www.youtube.com/watch?v=61nAQFREfYM #Freewill #neuroscience

#freewill #neuroscience

Search