home.social

#factverification — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #factverification, aggregated by home.social.

  1. AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

    Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

    benjaminhan.net/posts/20260507

    #Paper #Benchmark #FactVerification #NeurIPS #AI

  2. AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

    Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

    benjaminhan.net/posts/20260507

    #Paper #Benchmark #FactVerification #NeurIPS #AI

  3. AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

    Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

    benjaminhan.net/posts/20260507

    #Paper #Benchmark #FactVerification #NeurIPS #AI