home.social

Search

1000 results for “Benja”

  1. Benjamin #Netanyahu: Războiul cu 🇮🇷#Iranul „nu s-a terminat”.
    #PrimMinistrulIsraelului spune că uraniul îmbogățit trebuie „scos din 🇮🇷#Iran”.

    🔗 wp.me/p9KpFA-5gOd

    #Știri #Israel

  2. (Video) Benjamín Pineda se vuelve a equivocar desde el VAR

    (Video) Benjamín Pineda se vuelve a equivocar desde el VAR
    El arbitraje volvió a quedar bajo fuego en el Clausura 2026 y otra vez el nombre de Benjamín Pineda aparece en el centro de la discusión. El juego de vuelta de la semifinal entre Herediano y Cartaginés dejó una acción que desató fuertes cuestionamientos hacia el trabajo del VAR, precisamente do [...]

    #BenjamínPineda #Cartaginés #Deportes #Herediano #VAR

    elmundo.cr/deportes/video-benj

  3. "The Night Has a Thousand Eyes" is a song written by #BenjaminWeisman, Dorothy Wayne, and Marilyn Garrett. It became a popular hit in 1962 for #BobbyVee and has had several cover versions over the years.
    youtube.com/watch?v=3GQAmTznY2o

  4. "The Night Has a Thousand Eyes" is a song written by #BenjaminWeisman, Dorothy Wayne, and Marilyn Garrett. It became a popular hit in 1962 for #BobbyVee and has had several cover versions over the years.
    youtube.com/watch?v=3GQAmTznY2o

  5. "The Night Has a Thousand Eyes" is a song written by #BenjaminWeisman, Dorothy Wayne, and Marilyn Garrett. It became a popular hit in 1962 for #BobbyVee and has had several cover versions over the years.
    youtube.com/watch?v=3GQAmTznY2o

  6. "The Night Has a Thousand Eyes" is a song written by #BenjaminWeisman, Dorothy Wayne, and Marilyn Garrett. It became a popular hit in 1962 for #BobbyVee and has had several cover versions over the years.
    youtube.com/watch?v=3GQAmTznY2o

  7. Pentagon begins releasing its UAP files today after a February executive order. CBS canvassed six scientists on what to expect. For me the Apollo images are the most fascinating ones.

    benjaminhan.net/posts/20260508

    #UAP #Science #Space

  8. Jack Clark puts 60% on fully automated AI R&D by end of 2028, 30% by 2027. The case: benchmarks for every sub-skill trending up — coding (SWE-Bench ~2% → 93.9%), training-loop optimization (2.9x → 52x speedup, human 4x baseline passed three generations back), #METR time horizons (~30s in 2022 to ~12h today). The 30-vs-60 gap is a bet on how often a year-scale human insight still cracks a paradigm.

    benjaminhan.net/posts/20260508

    #AI #AGI #AIsafety #FutureOfWork

  9. Si tratta di spiccioli per la nazione high-tech che, con una sola bomba intelligente, distrugge in pochi secondi ciò che libanesi, iraniani e palestinesi hanno costruito nel corso di centinaia di anni.
    E, grazie a Dio, Israele ha molte bombe.

    Ma il denaro è fondamentale per realizzare il nostro obiettivo di un dominio ebraico assoluto ed esclusivo tra il fiume e il mare, indipendentemente dal fatto che il primo ministro sia #BenjaminNetanyahu o #NaftaliBennett.

    La somma non arricchirà necessariamente Israele, ma trattenerla impoverisce sia le famiglie palestinesi che l’intera società palestinese.
    L'Autorità Palestinese è sommersa dai debiti nei confronti delle banche, dei fornitori di beni e servizi e dei propri dipendenti del settore pubblico. ⬇️3

  10. @benjamin Je parle de mon éthique personnelle: je ne pourrai jamais me réjouir de la mort de quelqu'un. Sinon quelle différence y aurait-il entre lui et moi?
    Si un jour j'avais à tuer, ce serait avec autant de solennité/respect/dignité (limiter les souffrances) que possible et avec tristesse (#avatar déso de la rèf) . C'est un échec collectif d'en venir à s'entretuer. Jamais une joie.

    Je ne suis pas indemme des pensées puériles du type "chic l'avocat de Lepen boit du coca, il aura surement du diabète" et puis je me rappelle le nombre de militant.es qui fument 😅 Donc dans le fond ça n'a pas de sens d'adhérer à de telles pensées passagères.

  11. AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

    Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

    benjaminhan.net/posts/20260507

    #Paper #Benchmark #FactVerification #NeurIPS #AI

  12. AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

    Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

    benjaminhan.net/posts/20260507

    #Paper #Benchmark #FactVerification #NeurIPS #AI

  13. AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

    Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

    benjaminhan.net/posts/20260507

    #Paper #Benchmark #FactVerification #NeurIPS #AI

  14. The generalizable LLM failure mode isn't "can't reason". It's that outcome reward cements whatever theory was active when a level happened to clear. ARC Prize's analysis of GPT-5.5 and Opus 4.7 on ARC-AGI-3 (0.43%/0.18%) names this alongside two cousins.

    Self-improvement loops that measure only task success encode this systematically.

    Caveat the post buries: GPT-5.5's traces are from an 'analysis mode' run, not Opus's default harness.

    benjaminhan.net/posts/20260506

    #AI #LLMs #Reasoning

  15. How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

    benjaminhan.net/posts/20260505

    #Hallucination #LLMs #Calibration #ConformalPrediction #AI

  16. How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

    benjaminhan.net/posts/20260505

    #Hallucination #LLMs #Calibration #ConformalPrediction #AI