home.social

#commoncorpus — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #commoncorpus, aggregated by home.social.

  1. Common Corpus, an open training set for AI, goes global – and so should support for it

    As many of the AI stories on Walled Culture attest, one of the most contentious areas in the latest stage of AI development concerns the sourcing of training data. To create high-quality large language models (LLMs) massive quantities of training data are required. In the current genAI stampede, many companies are simply scraping everything they can off the Internet. Quite how that will work […]

    #aiAlliance #commonCorpus #curation #euAiAct #financeCommons #france #gdpr #github #legalCommons #llms #multilingual #openCulture #openGovernment #openScience #openSource #openWeb #pdf #permissiveLicensing #pleias #publicDomain #scraping #tokens #toxicity #wikimedia #youtube walledculture.org/common-corpu