home.social

#uralic — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #uralic, aggregated by home.social.

  1. New DNA study proposes that the Urheimat of #Uralic speaking peoples was not near Urals but much more eastward in Yakutia, from where they started moving some 4000 years ago. It is curious that most people in Estonia still seem to believe that Uralic speakers have lived here 5000 years, or even (after Kalevi Wiik's popular theory) 10 000 years, since the Ice Age. But both Baltic and Germanic speaking people probably lived here before Uralic speakers.

    nature.com/articles/s41586-025

  2. New DNA study proposes that the Urheimat of speaking peoples was not near Urals but much more eastward in Yakutia, from where they started moving some 4000 years ago. It is curious that most people in Estonia still seem to believe that Uralic speakers have lived here 5000 years, or even (after Kalevi Wiik's popular theory) 10 000 years, since the Ice Age. But both Baltic and Germanic speaking people probably lived here before Uralic speakers.

    nature.com/articles/s41586-025

  3. For too long Yakuts, Dolgans & other tribes have been held hostage by forces of Western-backed tribal separatism We will not stand idly by as our brothers & sisters are forced to live in a state of perpetual disunity & chaos #Western #Uralic ( #Moscow ) degenerated ideas are rotting us to the core

  4. For too long Yakuts, Dolgans & other tribes have been held hostage by forces of Western-backed tribal separatism We will not stand idly by as our brothers & sisters are forced to live in a state of perpetual disunity & chaos #Western #Uralic ( #Moscow ) degenerated ideas are rotting us to the core

  5. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  6. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  7. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  8. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  9. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  10. My fantastic colleague Daria Zhornik is on the newest episode of the podcast "A Language I Love Is..." talking about Mansi! Enjoy! shows.acast.com/a-language-i-l