home.social

#librilight β€” Public Fediverse posts

Live and recent posts from across the Fediverse tagged #librilight, aggregated by home.social.

  1. 🌍 #MOSEL: Multilingual Open-Source European Languages Dataset

    β€’ πŸ“Š 950,000 hours of #speech data covering 24 official EU languages
    β€’ πŸŽ™οΈ Includes up to 441K hours of unlabeled speech from #VoxPopuli and #LibriLight
    β€’ πŸ€– Transcribed using #Whisper large v3 #ASR model
    β€’ 🏷️ Covers both labeled and unlabeled #speechcorpora
    β€’ πŸ“œ Released under #CCBY40 license for #opensource use
    β€’ 🧠 Designed for training #AI #speechrecognition models

    Key features:
    β€’ Diverse language coverage
    β€’ Large-scale dataset
    β€’ Open-source compliant
    β€’ Includes pseudo-labeled data
    β€’ Supports #NLP and #machinelearning research

    Learn more: huggingface.co/datasets/FBK-MT