home.social

#toppaper — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #toppaper, aggregated by home.social.

  1. NEW BIML Bibliography entry

    arxiv.org/abs/2602.06923v1#

    From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers

    Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias

    Representation matters and is deeply constrained by tokenization. Excellent work, clearly described with real substance.

    #TOPPAPER #Representation #Tokenization

    berryvilleiml.com/references/

  2. NEW BIML Bibliography entry

    arxiv.org/abs/2503.03150

    Position: Model Collapse Does Not Mean What You Think

    Rylan Schaeffer, Joshua Kazdan, Alvan Caleb Arulandu, Sanmi Koyejo

    We think recursive pollution is a better term than model collapse. Weak terminology leads to misunderstanding of impact. See figure 4. This is a very good paper.

    #TOPPAPER #MLsec #RecursivePollution #DataPoisoning

    berryvilleiml.com/references/

  3. NEW BIML Bibliography entry

    arxiv.org/abs/2410.04840

    Strong Model Collapse

    Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian, Julia Kempe
    (NYU and META)

    Recursive pollution leads to model collapse. This view of strong model collapse describes what happens in the case of recursive data poison.
    #TOPPAPER #MLsec #Data #RecursivePollution

    berryvilleiml.com/references/

  4. NEW BIML Bibliography entry

    arxiv.org/abs/2509.16499

    A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective

    Lianghe Shi, et al

    A very nice set of references to work in model collapse. Collapsed model == lookup table (that is, no generalization). Discussion of recursive pollution as causing variance shrinkage or distribution shift.

    #TOPPAPER #MLsec #Data #RecursivePollution

    berryvilleiml.com/references/

  5. NEW BIML Bibliography entry AND NEW TOP FIVE #MLsec PAPER

    READ IT

    arxiv.org/pdf/2510.07192

    Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

    Alexandra Souly, ... Nicholas Carlini, et al

    Excellent paper, clear and well-stated (like all Carlini papers). This result shows that recursive pollution risk is even greater than we thought. Injecting backdoors is pretty easy. The examples are a bit simplistic.

    #TOPPAPER #MLsec #Attacks #DataPoisoning

    berryvilleiml.com/references/