home.social

#pretraining — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #pretraining, aggregated by home.social.

  1. Как я обучил GPT с нуля на русском языке — и что из этого получилось

    Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.

    habr.com/ru/articles/1037532/

    #GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык

  2. Как я обучил GPT с нуля на русском языке — и что из этого получилось

    Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.

    habr.com/ru/articles/1037532/

    #GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык

  3. Как я обучил GPT с нуля на русском языке — и что из этого получилось

    Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.

    habr.com/ru/articles/1037532/

    #GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык

  4. Как я обучил GPT с нуля на русском языке — и что из этого получилось

    Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.

    habr.com/ru/articles/1037532/

    #GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык

  5. Brain Drain @ OpenAI continues.

    Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... axios.com/2026/05/19/anthropic #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI

  6. Brain Drain @ OpenAI continues.

    Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... axios.com/2026/05/19/anthropic #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI

  7. Brain Drain @ OpenAI continues.

    Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... axios.com/2026/05/19/anthropic #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI

  8. Brain Drain @ OpenAI continues.

    Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... axios.com/2026/05/19/anthropic #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI

  9. Brain Drain @ OpenAI continues.

    Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... axios.com/2026/05/19/anthropic

  10. RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.

    mehr auf Arint.info

    #AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info

    https://x.com/AndrewCurran_/status/2056776839402795041#m

  11. RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.

    mehr auf Arint.info

    #AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info

    https://x.com/AndrewCurran_/status/2056776839402795041#m

  12. RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.

    mehr auf Arint.info

    #AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info

    https://x.com/AndrewCurran_/status/2056776839402795041#m

  13. RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.

    mehr auf Arint.info

    #AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info

    https://x.com/AndrewCurran_/status/2056776839402795041#m

  14. RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.

    mehr auf Arint.info

    #AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info

    https://x.com/AndrewCurran_/status/2056776839402795041#m

  15. La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!

    #AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource

  16. La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!

    #AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource

  17. La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!

    #AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource

  18. La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!

    #AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource

  19. La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!

    #AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource

  20. #LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant#persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. anthropic.com/research/assista #AIagent #AI #ML #NLP #LLM #GenAI

  21. #LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant#persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. anthropic.com/research/assista #AIagent #AI #ML #NLP #LLM #GenAI

  22. #LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant#persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. anthropic.com/research/assista #AIagent #AI #ML #NLP #LLM #GenAI

  23. #LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant#persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. anthropic.com/research/assista #AIagent #AI #ML #NLP #LLM #GenAI

  24. #LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant#persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. anthropic.com/research/assista #AIagent #AI #ML #NLP #LLM #GenAI

  25. Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!

    Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia

  26. Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!

    Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia

  27. Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!

    Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia

  28. Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!

    Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia

  29. Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!

    Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. llm-brain-rot.github.io/

  30. Bài viết bạn cần mô tả thông tin về-search reciprack pretraining NVFP4/MXFP4 trên GPU Blackwell. Có từ ectopic hơn một người hỏi về công thức hoàn chỉnh, trong khi tài liệu chính thức và blog hiện tại thiếu chi tiết. Tags: #AI #NVIDIA #MXFP4 #NVFP4 #BlackwellGPU #Pretraining #MachineLearning #Tech

    reddit.com/r/LocalLLaMA/commen

  31. "RLP: Reinforcement as a Pretraining Objective" giới thiệu phương pháp RLP mới, tích hợp học tăng cường (RL) vào giai đoạn tiền huấn luyện của các mô hình AI. RLP khuyến khích mô hình "tự suy nghĩ" sớm hơn bằng cách xem chuỗi tư duy như hành động khám phá, thưởng cho thông tin hữu ích. Kết quả: cải thiện đáng kể khả năng suy luận, ví dụ Qwen3-1.7B-Base tăng 19%, Nemotron-Nano-12B-v2 tăng từ 42.81% lên 61.32%.
    #RLP #ReinforcementLearning #Pretraining #AI #LLM #MachineLearning #Reasoning #HocTangC

  32. 🚀 New research shows how noisy, large-scale data can improve semantic segmentation in #EarthObservation using a novel weakly supervised pretraining method, CromSS.

    🧠 The approach boosts multi-modal feature learning with cross-modal consistency and label smoothing.

    🔗 Thanks to EvoLand partner DLR's Conrad Albrecht for his key role!

    📚 Learn more: evo-land.eu/wp-content/uploads

    #RemoteSensing #Pretraining #OpenScience #EvoLand #NoisyLabels #DeepLearning

    @EU_HaDEA @euenvironment @DLR @cnes

  33. 🚀 New research shows how noisy, large-scale data can improve semantic segmentation in #EarthObservation using a novel weakly supervised pretraining method, CromSS.

    🧠 The approach boosts multi-modal feature learning with cross-modal consistency and label smoothing.

    🔗 Thanks to EvoLand partner DLR's Conrad Albrecht for his key role!

    📚 Learn more: evo-land.eu/wp-content/uploads

    #RemoteSensing #Pretraining #OpenScience #EvoLand #NoisyLabels #DeepLearning

    @EU_HaDEA @euenvironment @DLR @cnes