home.social

#eleutherai — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #eleutherai, aggregated by home.social.

  1. TechCrunch: EleutherAI releases massive AI training dataset of licensed and open domain text. “The dataset, called the Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, the Common Pile v0.1 was used to train two new AI models from EleutherAI, […]

    https://rbfirehose.com/2025/06/07/techcrunch-eleutherai-releases-massive-ai-training-dataset-of-licensed-and-open-domain-text/

  2. YouTube creators surprised to find Apple and others trained AI on their videos - Enlarge / YouTuber Marques Brownlee discusses iOS 18 in a new video. Th... - arstechnica.com/?p=2037316 #largelanguagemodels #eleutherai #anthropic #thepile #youtube #google #apple #tech #ai

  3. How the Foundation Model Transparency Index Distorts Transparency | EleutherAI Blog blog.eleuther.ai/fmti-critique

    I saw the Foundation Model Transparency Index paper come out recently and was surprised that OpenAI scored as high as they did. This Eleuther AI post breaks down how the Foundation Model Transparency index gets it all wrong, and is not really measuring transparency at all.

    #fmti
    #foundationmodeltransparencyindex
    #opensource
    #LLM
    #eleutherai

  4. Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

    "The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

    together.xyz/blog/redpajama-mo

  5. Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

    "The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

    together.xyz/blog/redpajama-mo

  6. Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

    "The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

    together.xyz/blog/redpajama-mo

  7. Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

    "The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

    together.xyz/blog/redpajama-mo

  8. Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

    "The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

    together.xyz/blog/redpajama-mo

  9. “A really big deal”—Dolly is a free, open source, ChatGPT-style AI model - Enlarge (credit: Databricks)

    On Wednesday, Databricks released... - arstechnica.com/?p=1931693 #largelanguagemodels #machinelearning #textsynthesis #apachespark #databricks #eleutherai #finetuning #biz#pythia #dolly #llama #meta #ai