home.social

#generalization — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #generalization, aggregated by home.social.

  1. Ilya Sutskever argues that we’re shifting from the age of scaling to the age of research: today’s models excel on benchmarks but still generalize far worse than humans. The interview highlights why future progress will depend on new learning principles, continual learning, and a deeper understanding of generalization — not just more compute.
    dwarkesh.com/p/ilya-sutskever-2
    #AIResearch #Generalization #FutureOfAI

  2. How does the #brain transfer #MotorSkills between hands? This study reveals that transfer relies on re-expressing the neural patterns established during initial learning in distributed higher-order brain areas, offering new insights into learning #generalization @PLOSBiology plos.io/41LOAWf

  3. 'Random Pruning Over-parameterized Neural Networks Can Improve Generalization: A Training Dynamics Analysis', by Hongru Yang, Yingbin Liang, Xiaojie Guo, Lingfei Wu, Zhangyang Wang.

    jmlr.org/papers/v26/23-0832.ht

    #pruning #pruned #generalization

  4. Humans can apply solutions of past problems to new problems. @gershbrain @nicoschuck &co reveal the neural correlates of #generalization and show that humans apply past policies in a reward-sensitive manner that leads to high performance @PLOSBiology plos.io/3SJPMof

  5. e509 — Maverick and Marbles

    e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

    gamesatwork.biz/2025/04/14/e50

  6. e509 — Maverick and Marbles

    e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

    gamesatwork.biz/2025/04/14/e50

  7. People value us for the value (they believe) we (might) add to them.

    Generalizing of course, but it's all transactional. There's no (longer) valuing people for just who they are.

    #society #people #life #generalization

  8. 'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

    jmlr.org/papers/v25/24-0220.ht

    #sparse #learns #generalization

  9. 'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

    jmlr.org/papers/v25/24-0220.ht

    #sparse #learns #generalization

  10. 'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

    jmlr.org/papers/v25/24-0220.ht

    #sparse #learns #generalization

  11. 'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

    jmlr.org/papers/v25/24-0220.ht

    #sparse #learns #generalization

  12. 'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

    jmlr.org/papers/v25/24-0220.ht

    #sparse #learns #generalization

  13. 'Mentored Learning: Improving Generalization and Convergence of Student Learner', by Xiaofeng Cao, Yaming Guo, Heng Tao Shen, Ivor W. Tsang, James T. Kwok.

    jmlr.org/papers/v25/23-1213.ht

    #learners #learner #generalization

  14. 'Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK', by Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Yingbin Liang, Zhangyang Wang.

    jmlr.org/papers/v25/23-0831.ht

    #sparse #gradient #generalization

  15. Могут ли трансформеры «думать»

    Недавние исследования показывают, что модели трансформеров способны почти безошибочно решать задачи, требующие нескольких логических шагов. Например, из утверждения А вывести Б и дойти логически до В. И что удивительно, это достигается без использования Chain-of-Thought или особых промптов — только классический GPT-2. Давайте посмотрим, как трансформеры «думают» при решении задач рассуждения, и напишем для этого код с использованием библиотеки Hugging Face.

    habr.com/ru/articles/840136/

    #GPT #грокинг #память_ИИ #задачи_рассуждения #общий_искусственный_интеллект #обобщение #generalization #трансформатор #память_трансформеров

  16. 'Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance', by Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen.

    jmlr.org/papers/v25/23-1287.ht

    #objectives #objective #generalization

  17. Here's a very simple sequence (generalized from the Fibonacci sequence) to discourage students from generalizing a pattern too quickly. In fact, the sequence will look like it is the powers of 2 until it stops.

    1, 1, 2, 4, 8, 16, ..., 2ᵏ, 2ᵏ⁺¹−1, 2ᵏ⁺²−3, 2ᵏ⁺³−8, ...

    By selecting a detail in the sequence's (recursive) formula, I can control what the value of 𝑘 will be. So, technically, this is a family of sequences with the Fibonacci sequence being the one with 𝑘=2.

    Reasons this family of sequences is cool:

    1. I can control exactly what the value of the last power of 2 is and can make the pattern break after 2, 3, 10, 20, or 100 consecutive powers of 2 showing up.

    2. The formula for this sequence is very easy to describe:
    Start with a 1 and to find a new term, add up the last 𝑘 terms of the sequence (everything before the starting 1 that can be considered to be 0 if needed). Note that the 𝑘 terms being added up will match with the first 𝑘 powers of two (starting at 2⁰=1) showing up in the sequence before the pattern breaks.

    3. If you know the Fibonacci sequence (which is the special case of 𝑘=2), then this family of sequences is a natural generalization to look at. See:
    en.wikipedia.org/wiki/Generali

    4. If we adjust it to say "sum of all previous terms", we do in fact get the powers of two sequence.
    Proof (by induction):
    Base case: 1 + 1 = 2
    Hypothesis: Assume that upto now, we've added up terms and gotten a power of two, say 2ᵏ.
    Inductive step: For the next term, when we add all previous terms, we would add the terms that gave us 2ᵏ and then add the 2ᵏ term itself resulting in the sum of 2ᵏ⁺¹.

    #math; #pattern in a #sequence; #PowersOf2; #generalization of #Fibonacci.

  18. 'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.

    jmlr.org/papers/v25/23-0422.ht

    #classifiers #generalization #minimization

  19. 'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.

    jmlr.org/papers/v25/23-0422.ht

    #classifiers #generalization #minimization

  20. 'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.

    jmlr.org/papers/v25/23-0422.ht

    #classifiers #generalization #minimization

  21. 'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.

    jmlr.org/papers/v25/23-0422.ht

    #classifiers #generalization #minimization

  22. 'Generalization and Stability of Interpolating Neural Networks with Minimal Width', by Hossein Taheri, Christos Thrampoulidis.

    jmlr.org/papers/v25/23-0422.ht

    #classifiers #generalization #minimization

  23. 'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.

    jmlr.org/papers/v25/23-0802.ht

    #causal #generalization #invariance

  24. 'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.

    jmlr.org/papers/v25/23-0802.ht

    #causal #generalization #invariance

  25. 'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.

    jmlr.org/papers/v25/23-0802.ht

    #causal #generalization #invariance

  26. 'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.

    jmlr.org/papers/v25/23-0802.ht

    #causal #generalization #invariance

  27. 'Effect-Invariant Mechanisms for Policy Generalization', by Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters.

    jmlr.org/papers/v25/23-0802.ht

    #causal #generalization #invariance

  28. My favorite molecular #protein-#ligand #docking method, #DiffDock, has been updated! The new DiffDock-L, provides a significant improvement in performance and generalization capacity.

    Importantly., this new method comes with the new #DockGen benchmark, aiming to provide better evaluation metrics and help improve #generalization of #ML docking models by accounting for sequence-dissimilar proteins with very similar binding pockets in training/test splits.

    arxiv.org/abs/2402.18396

  29. 'On the Generalization of Stochastic Gradient Descent with Momentum', by Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang.

    jmlr.org/papers/v25/22-0068.ht

    #sgd #epochs #generalization

  30. 'On the Generalization of Stochastic Gradient Descent with Momentum', by Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang.

    jmlr.org/papers/v25/22-0068.ht

    #sgd #epochs #generalization

  31. 'On the Generalization of Stochastic Gradient Descent with Momentum', by Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang.

    jmlr.org/papers/v25/22-0068.ht

    #sgd #epochs #generalization

  32. 'On the Generalization of Stochastic Gradient Descent with Momentum', by Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang.

    jmlr.org/papers/v25/22-0068.ht

    #sgd #epochs #generalization