#pretraining — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #pretraining, aggregated by home.social.
-
Как я обучил GPT с нуля на русском языке — и что из этого получилось
Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.
https://habr.com/ru/articles/1037532/
#GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык
-
Как я обучил GPT с нуля на русском языке — и что из этого получилось
Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.
https://habr.com/ru/articles/1037532/
#GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык
-
Как я обучил GPT с нуля на русском языке — и что из этого получилось
Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.
https://habr.com/ru/articles/1037532/
#GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык
-
Как я обучил GPT с нуля на русском языке — и что из этого получилось
Всё началось с наивной мысли: зачем платить за API или тащить 7B-модель, если мне нужна маленькая модель для простых разговоров на одном языке? Логика казалась железной — большие модели умеют всё и на всех языках сразу, но это же избыточно. 0.7B, заточенная под один язык и один стиль общения, должна справляться не хуже. Спойлер: это было наивно. Но путь оказался ценнее результата.
https://habr.com/ru/articles/1037532/
#GPT #LLM #pretraining #распределённое_обучение #Google_Colab #RoPE #GQA #SwiGLU #NLP #русский_язык
-
Brain Drain @ OpenAI continues.
Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI
-
Brain Drain @ OpenAI continues.
Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI
-
Brain Drain @ OpenAI continues.
Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI
-
Brain Drain @ OpenAI continues.
Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI
-
Brain Drain @ OpenAI continues.
Andrej Karpathy a founding member of OpenAI is joining Anthropic and will focus on building out Anthropic's pretraining research .... https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude #AI #Karpathy #Anthropic #OpenAI #BrainDrain #Pretraining #Claude #LLMs #FrontierAI
-
RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.
mehr auf Arint.info
#AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info
-
RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.
mehr auf Arint.info
#AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info
-
RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.
mehr auf Arint.info
#AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info
-
RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.
mehr auf Arint.info
#AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info
-
RT @AndrewCurran_: Karpathy wird ein neues Pre-Training-Team bilden, das sich auf rekursive Selbstverbesserung konzentriert, und wird Claude beibringen, das Training von Claude zu verbessern, wie von Axios berichtet.
mehr auf Arint.info
#AI #Axios #Karpathy #PreTraining #SelfImprovement #arint_info
-
https://www.europesays.com/news/30693/ OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team #AndrejKarpathy #Anthropic #Headlines #News #OpenAI #pretraining #TopStories
-
La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!
#AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource
-
La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!
#AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource
-
La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!
#AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource
-
La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!
#AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource
-
La leyenda urbana de «entrenar tu propio modelo de IAG» sigue siendo un anzuelo para monetizar tutoriales, cursos, masterclass y demás productos que los gurúes y promotores de la IA generativa usan para seguir lucrando a costa de todos los autores vulnerados. ¡No se dejen engañar!
#AI #MachineLearning #data #training #finetuning #AImodel #genAI #generativeAI #pretraining #Copyright #opensource
-
Pretraining Language Models via Neural Cellular Automata
https://hanseungwook.github.io/blog/nca-pre-pre-training/
#HackerNews #Pretraining #Language #Models #Neural #Cellular #Automata #AI #Research #Machine #Learning
-
Pretraining Language Models via Neural Cellular Automata
https://hanseungwook.github.io/blog/nca-pre-pre-training/
#HackerNews #Pretraining #Language #Models #Neural #Cellular #Automata #AI #Research #Machine #Learning
-
Pretraining Language Models via Neural Cellular Automata
https://hanseungwook.github.io/blog/nca-pre-pre-training/
#HackerNews #Pretraining #Language #Models #Neural #Cellular #Automata #AI #Research #Machine #Learning
-
Pretraining Language Models via Neural Cellular Automata
https://hanseungwook.github.io/blog/nca-pre-pre-training/
#HackerNews #Pretraining #Language #Models #Neural #Cellular #Automata #AI #Research #Machine #Learning
-
Pretraining Language Models via Neural Cellular Automata
https://hanseungwook.github.io/blog/nca-pre-pre-training/
#HackerNews #Pretraining #Language #Models #Neural #Cellular #Automata #AI #Research #Machine #Learning
-
#LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant” #persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. https://www.anthropic.com/research/assistant-axis?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
#LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant” #persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. https://www.anthropic.com/research/assistant-axis?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
#LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant” #persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. https://www.anthropic.com/research/assistant-axis?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
#LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant” #persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. https://www.anthropic.com/research/assistant-axis?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
#LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant” #persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. https://www.anthropic.com/research/assistant-axis?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!
Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. https://llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia
-
Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!
Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. https://llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia
-
Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!
Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. https://llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia
-
Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!
Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. https://llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia
-
Brain rot, the cognitive decline and mental exhaustion experienced by individuals, particularly adolescents and young adults, due to excessive exposure to low-quality online materials - can also impact LLMs negatively causing cognitive decline, reduced reasoning abilities and degraded memory. The models also became less ethically aligned and more psychopathic according to two measures. Ouch!
Researchers pretrained LLMs with junk data and the results that data quality is a causal driver of LLM capability decay. The declines in LLMs includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities. https://llm-brain-rot.github.io/ #AI #LLMs #BrainRot #CognitiveDecline #Pretraining #SocialMedia
-
Bài viết bạn cần mô tả thông tin về-search reciprack pretraining NVFP4/MXFP4 trên GPU Blackwell. Có từ ectopic hơn một người hỏi về công thức hoàn chỉnh, trong khi tài liệu chính thức và blog hiện tại thiếu chi tiết. Tags: #AI #NVIDIA #MXFP4 #NVFP4 #BlackwellGPU #Pretraining #MachineLearning #Tech
https://www.reddit.com/r/LocalLLaMA/comments/1odhz2s/looking_for_a_working_nvfp4mxfp4_pretraining/
-
"RLP: Reinforcement as a Pretraining Objective" giới thiệu phương pháp RLP mới, tích hợp học tăng cường (RL) vào giai đoạn tiền huấn luyện của các mô hình AI. RLP khuyến khích mô hình "tự suy nghĩ" sớm hơn bằng cách xem chuỗi tư duy như hành động khám phá, thưởng cho thông tin hữu ích. Kết quả: cải thiện đáng kể khả năng suy luận, ví dụ Qwen3-1.7B-Base tăng 19%, Nemotron-Nano-12B-v2 tăng từ 42.81% lên 61.32%.
#RLP #ReinforcementLearning #Pretraining #AI #LLM #MachineLearning #Reasoning #HocTangC -
AI models can acquire backdoors from surprisingly few malicious documents - Scraping the open web for AI training data can have its draw... - https://arstechnica.com/ai/2025/10/ai-models-can-acquire-backdoors-from-surprisingly-few-malicious-documents/ #ukaisecurityinstitute #alanturinginstitute #aivulnerabilities #backdoorattacks #machinelearning #datapoisoning #trainingdata #llmsecurity #modelsafety #pretraining #airesearch #aisecurity #finetuning #anthropic #biz #ai
-
AI models can acquire backdoors from surprisingly few malicious documents - Scraping the open web for AI training data can have its draw... - https://arstechnica.com/ai/2025/10/ai-models-can-acquire-backdoors-from-surprisingly-few-malicious-documents/ #ukaisecurityinstitute #alanturinginstitute #aivulnerabilities #backdoorattacks #machinelearning #datapoisoning #trainingdata #llmsecurity #modelsafety #pretraining #airesearch #aisecurity #finetuning #anthropic #biz #ai
-
AI models can acquire backdoors from surprisingly few malicious documents - Scraping the open web for AI training data can have its draw... - https://arstechnica.com/ai/2025/10/ai-models-can-acquire-backdoors-from-surprisingly-few-malicious-documents/ #ukaisecurityinstitute #alanturinginstitute #aivulnerabilities #backdoorattacks #machinelearning #datapoisoning #trainingdata #llmsecurity #modelsafety #pretraining #airesearch #aisecurity #finetuning #anthropic #biz #ai
-
AI models can acquire backdoors from surprisingly few malicious documents - Scraping the open web for AI training data can have its draw... - https://arstechnica.com/ai/2025/10/ai-models-can-acquire-backdoors-from-surprisingly-few-malicious-documents/ #ukaisecurityinstitute #alanturinginstitute #aivulnerabilities #backdoorattacks #machinelearning #datapoisoning #trainingdata #llmsecurity #modelsafety #pretraining #airesearch #aisecurity #finetuning #anthropic #biz #ai
-
AI models can acquire backdoors from surprisingly few malicious documents - Scraping the open web for AI training data can have its draw... - https://arstechnica.com/ai/2025/10/ai-models-can-acquire-backdoors-from-surprisingly-few-malicious-documents/ #ukaisecurityinstitute #alanturinginstitute #aivulnerabilities #backdoorattacks #machinelearning #datapoisoning #trainingdata #llmsecurity #modelsafety #pretraining #airesearch #aisecurity #finetuning #anthropic #biz #ai
-
Knowledge Infusion Scaling Law for Pre-Training Large Language Models
https://arxiv.org/abs/2509.19371
#HackerNews #KnowledgeInfusion #LanguageModels #PreTraining #AIResearch #MachineLearning #ScalingLaw
-
Knowledge Infusion Scaling Law for Pre-Training Large Language Models
https://arxiv.org/abs/2509.19371
#HackerNews #KnowledgeInfusion #LanguageModels #PreTraining #AIResearch #MachineLearning #ScalingLaw
-
Knowledge Infusion Scaling Law for Pre-Training Large Language Models
https://arxiv.org/abs/2509.19371
#HackerNews #KnowledgeInfusion #LanguageModels #PreTraining #AIResearch #MachineLearning #ScalingLaw
-
Knowledge Infusion Scaling Law for Pre-Training Large Language Models
https://arxiv.org/abs/2509.19371
#HackerNews #KnowledgeInfusion #LanguageModels #PreTraining #AIResearch #MachineLearning #ScalingLaw
-
Knowledge Infusion Scaling Law for Pre-Training Large Language Models
https://arxiv.org/abs/2509.19371
#HackerNews #KnowledgeInfusion #LanguageModels #PreTraining #AIResearch #MachineLearning #ScalingLaw
-
NEC AI technology digitalizes work tasks without the need for pre-training: Press Releases https://www.byteseu.com/1323372/ #AI #ArtificialIntelligence #construction #DigitalTwin #digitalize #factory #optimize #PreTraining #processes #productivity #Technology #video #warehouse #WideArea #WorkTasks #worksite
-
🚀 New research shows how noisy, large-scale data can improve semantic segmentation in #EarthObservation using a novel weakly supervised pretraining method, CromSS.
🧠 The approach boosts multi-modal feature learning with cross-modal consistency and label smoothing.
🔗 Thanks to EvoLand partner DLR's Conrad Albrecht for his key role!
#RemoteSensing #Pretraining #OpenScience #EvoLand #NoisyLabels #DeepLearning
-
🚀 New research shows how noisy, large-scale data can improve semantic segmentation in #EarthObservation using a novel weakly supervised pretraining method, CromSS.
🧠 The approach boosts multi-modal feature learning with cross-modal consistency and label smoothing.
🔗 Thanks to EvoLand partner DLR's Conrad Albrecht for his key role!
#RemoteSensing #Pretraining #OpenScience #EvoLand #NoisyLabels #DeepLearning