home.social

#ligurian — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #ligurian, aggregated by home.social.

  1. Went for a walkabout in #triora near the #Ligurian coast at #italy France border. It is a spectacular village with a history of #witch s and has various folklore and horror themed events. Here’s some witch doors:

  2. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  3. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  4. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  5. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  6. The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

    ➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

    ➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

    ➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

    ➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

    What are your interpretations of the dataset?

    observablehq.com/@kathyreid/mo

  7. Week 16, 2023: What @[email protected] album languages grew the most this week?

    Eh… Avē Imperātor? The Genovese take home the victory in a decidely Romanophone heat. 🥇🎉

    📊 #Wikidata 🎶🎵 #ExMusica @[email protected] #Ligurian #LengoaLigure #Genovese #Zeneise #LenguaLiggyre

  8. Week 16, 2023: What @[email protected] album languages grew the most this week?

    Eh… Avē Imperātor? The Genovese take home the victory in a decidely Romanophone heat. 🥇🎉

    📊 #Wikidata 🎶🎵 #ExMusica @[email protected] #Ligurian #LengoaLigure #Genovese #Zeneise #LenguaLiggyre

  9. Week 16, 2023: What @[email protected] album languages grew the most this week?

    Eh… Avē Imperātor? The Genovese take home the victory in a decidely Romanophone heat. 🥇🎉

    📊 #Wikidata 🎶🎵 #ExMusica @[email protected] #Ligurian #LengoaLigure #Genovese #Zeneise #LenguaLiggyre

  10. Week 16, 2023: What @[email protected] album languages grew the most this week?

    Eh… Avē Imperātor? The Genovese take home the victory in a decidely Romanophone heat. 🥇🎉

    📊 #Wikidata 🎶🎵 #ExMusica @[email protected] #Ligurian #LengoaLigure #Genovese #Zeneise #LenguaLiggyre

  11. Week 16, 2023: What @[email protected] album languages grew the most this week?

    Eh… Avē Imperātor? The Genovese take home the victory in a decidely Romanophone heat. 🥇🎉

    📊 #Wikidata 🎶🎵 #ExMusica @[email protected] #Ligurian #LengoaLigure #Genovese #Zeneise #LenguaLiggyre

  12. #repost from my old profile. I love #cooking and when I do, I like to go a bit overboard. This was part of a special all #Ligurian meal I prepared for my boyfriend. #italianfood #italianriviera #liguria #foodie #foodporn #foodlover #wine #italianwine