#toppaper — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #toppaper, aggregated by home.social.
-
NEW BIML Bibliography entry
https://arxiv.org/abs/2602.06923v1#
From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias
Representation matters and is deeply constrained by tokenization. Excellent work, clearly described with real substance.
-
NEW BIML Bibliography entry
https://arxiv.org/abs/2503.03150
Position: Model Collapse Does Not Mean What You Think
Rylan Schaeffer, Joshua Kazdan, Alvan Caleb Arulandu, Sanmi Koyejo
We think recursive pollution is a better term than model collapse. Weak terminology leads to misunderstanding of impact. See figure 4. This is a very good paper.
-
NEW BIML Bibliography entry
https://arxiv.org/abs/2410.04840
Strong Model Collapse
Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian, Julia Kempe
(NYU and META)Recursive pollution leads to model collapse. This view of strong model collapse describes what happens in the case of recursive data poison.
#TOPPAPER #MLsec #Data #RecursivePollution -
NEW BIML Bibliography entry
https://arxiv.org/abs/2509.16499
A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
Lianghe Shi, et al
A very nice set of references to work in model collapse. Collapsed model == lookup table (that is, no generalization). Discussion of recursive pollution as causing variance shrinkage or distribution shift.
-
NEW BIML Bibliography entry AND NEW TOP FIVE #MLsec PAPER
READ IT
https://arxiv.org/pdf/2510.07192
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Alexandra Souly, ... Nicholas Carlini, et al
Excellent paper, clear and well-stated (like all Carlini papers). This result shows that recursive pollution risk is even greater than we thought. Injecting backdoors is pretty easy. The examples are a bit simplistic.