home.social

Search

1000 results for “Benja”

  1. How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

    benjaminhan.net/posts/20260505

    #Hallucination #LLMs #Calibration #ConformalPrediction #AI

  2. How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

    benjaminhan.net/posts/20260505

    #Hallucination #LLMs #Calibration #ConformalPrediction #AI

  3. How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

    benjaminhan.net/posts/20260505

    #Hallucination #LLMs #Calibration #ConformalPrediction #AI

  4. Costa Rica / Israel: ¿Benjamín Netanyahu… asistiendo al próximo acto de traspaso de poderes en Costa Rica?

    Costa Rica / Israel: ¿Benjamín Netanyahu… asistiendo al próximo acto de traspaso de poderes en Costa Rica?
    El pasado 27 de abril, un titular de un artículo de La Nación se lee como sigue: “¿Vendrá el presidente o el primer ministro de Israel a la toma de posesión de L [...]

    #BenjaminNetanyahu #Gaza #Israel #LauraFernández #Opinión #RodrigoChaves #TraspasoDePoderes

    elmundo.cr/opinion/costa-rica-

  5. Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #Nature #AI

  6. Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #Nature #AI

  7. Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #Nature #AI

  8. Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #Nature #AI

  9. Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #Nature #AI

  10. Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #ICML #AI

  11. Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #ICML #AI

  12. Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #ICML #AI

  13. Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #ICML #AI

  14. Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

    benjaminhan.net/posts/20260505

    #LLMs #Calibration #Hallucination #ICML #AI

  15. Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

  16. Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

  17. Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

  18. Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

  19. Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

  20. Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #LLMs #Hallucination #ICLR #AI

  21. Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #LLMs #Hallucination #ICLR #AI

  22. Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #LLMs #Hallucination #ICLR #AI

  23. Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #LLMs #Hallucination #ICLR #AI

  24. Conformal Language Modeling (CLM) adapts conformal prediction to generative LMs: sample candidates, stop when a calibrated rule fires, return a set guaranteed to contain an acceptable answer. The more interesting half is the component-level filter — per-phrase coverage, not just set-level. That's the primitive for hallucination flagging: highlight the vetted phrases, leave the rest for review.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #LLMs #Hallucination #ICLR #AI

  25. A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #UncertaintyQuantification #AI

  26. A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #UncertaintyQuantification #AI

  27. A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #UncertaintyQuantification #AI

  28. A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #UncertaintyQuantification #AI

  29. A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

    benjaminhan.net/posts/20260505

    #ConformalPrediction #Calibration #UncertaintyQuantification #AI

  30. Benjamin Libet: If your brain initiates action before your consciousness of deciding, then what exactly is the 'you' that thinks it is choosing? youtube.com/watch?v=61nAQFREfYM #Freewill #neuroscience