#corpora — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #corpora, aggregated by home.social.
-
Using cognitive diversity for stronger, smarter cyber defense https://www.helpnetsecurity.com/2025/01/15/mel-morris-corpora-ai-cognitive-diversity-cybersecurity/ #cybersecurity #Corpora.ai #resilience #Don'tmiss #Features #Hotstuff #training #attacks #opinion #News
-
Later today at #CHR2024, we are going to present our work on #Multilingual #Stylometry!
We isolated the influence of #language on #authorship #attribution #accuracy by translating multiple #corpora into each others' languages while keeping #corpus composition stable.
Interactive showcase: https://showcases.clsinfra.io/stylometry
Full paper: https://ceur-ws.org/Vol-3834/paper9.pdf
This work was developed within the @CLSinfra project in #Trier, #Krakow and #Prague with Artjoms Šeļa, Evgeniia Fileva and Julia Dudar.
-
So you wanna parse/manipulate some #PDF's, huh!?
Well, you better #test your #software thoroughly or bad things will happen!🧪
So how about "this corpus [which] contains nearly 8 million PDFs gathered from across the web in July/August of 2021":
https://digitalcorpora.org/corpora/file-corpora/cc-main-2021-31-pdf-untruncated/The entire corpus when uncompressed takes up nearly 8 TB!
You can find some more links to different #corpora (even to ones deemed #unsafe!😬) at pdf-association's Github:
-
👋 Greetings! 👋
We wanted to remind all #fakespeakers that the #Fakespeak project is still alive and kicking – especially after a long and #fakenews filled summer vacation.
We have some great events and research output coming out in the next few months, including a #linguistics conference, #multilingual fake news #corpora, publications bringing together advanced linguistic features and #transformermodels, and a special issue in Linguistics Vanguard on the language of fake news.
Follow along!
-
Interestingly, very few psychologists are aware of #linguistic #corpora 📊 and their immense research potential. Platforms like CLARIN-PL offer invaluable data that can significantly enhance our understanding of human behaviour and social interactions. 🤝🗣️ It's time more of us psych folk tapped into these resources to advance our field! 🌟🔍
-
And another one for fellow linguists interested in compiling #corpora of digital discourse: MastoScraper takes advantage of the Mastodon API to collect toots based on a keyword search.
Here goes, feedback welcome!
#linguistics @linguistics
https://fmoncomble.github.io/mastoscraper/ -
Next week, we'll be discussing how to archive and research social media data on a large scale "After Twitter". Very excited to see what comes out of this conference, and also the following data sprint delving into huge German Twitter corpora.
https://www.dnb.de/twittertagung
#AfterTwitter #corpora #research -
interesting publication on medieval Latin text corpora by @TimGeelhaar : 🔖 Geelhaar, Tim. „Hamsterrad oder Himmelsleiter? Oder warum die Digitalisierung so endlos scheint“. Application/epub+zip,application/pdf, 2024. https://doi.org/10.15499/KDS-005-016.
-
#Eduhub days 2024 at #ZHAW and I cannot be there 😢 🩼
If you go, stop by at the marketplace—in the afternoon my colleague Maren Runte will show our work on creating a learning space for working with linguistic #corpora (to be released later this year)
-
#Eduhub days 2024 at #ZHAW and I cannot be there 😢 🩼
If you go, stop by at the marketplace—in the afternoon my colleague Maren Runte will show our work on creating a learning space for working with linguistic #corpora (to be released later this year)
-
#Eduhub days 2024 at #ZHAW and I cannot be there 😢 🩼
If you go, stop by at the marketplace—in the afternoon my colleague Maren Runte will show our work on creating a learning space for working with linguistic #corpora (to be released later this year)
-
#Eduhub days 2024 at #ZHAW and I cannot be there 😢 🩼
If you go, stop by at the marketplace—in the afternoon my colleague Maren Runte will show our work on creating a learning space for working with linguistic #corpora (to be released later this year)
-
CLS INFRA Training School Vienna 2024 June 10th–12th, 2024:
ExploreCor: "Using Programmable #Corpora in #Computational #Literary #Studies"
This intensive program covers some of the most important steps in the research cycle of CLS, focusing on “Programmable Corpora” – dynamic collections of literary texts manipulated programmatically.
Apply now! https://pretix.eu/CLSINFRA-trainingschool2024/application/
Colocated with #CCLS2024, June 13-14, 2024: https://jcls.io/site/conference/
-
CLS INFRA Training School Vienna 2024 June 10th–12th, 2024:
ExploreCor: "Using Programmable #Corpora in #Computational #Literary #Studies"
This intensive program covers some of the most important steps in the research cycle of CLS, focusing on “Programmable Corpora” – dynamic collections of literary texts manipulated programmatically.
Apply now! https://pretix.eu/CLSINFRA-trainingschool2024/application/
Colocated with #CCLS2024, June 13-14, 2024: https://jcls.io/site/conference/
-
CLS INFRA Training School Vienna 2024 June 10th–12th, 2024:
ExploreCor: "Using Programmable #Corpora in #Computational #Literary #Studies"
This intensive program covers some of the most important steps in the research cycle of CLS, focusing on “Programmable Corpora” – dynamic collections of literary texts manipulated programmatically.
Apply now! https://pretix.eu/CLSINFRA-trainingschool2024/application/
Colocated with #CCLS2024, June 13-14, 2024: https://jcls.io/site/conference/
-
CLS INFRA Training School Vienna 2024 June 10th–12th, 2024:
ExploreCor: "Using Programmable #Corpora in #Computational #Literary #Studies"
This intensive program covers some of the most important steps in the research cycle of CLS, focusing on “Programmable Corpora” – dynamic collections of literary texts manipulated programmatically.
Apply now! https://pretix.eu/CLSINFRA-trainingschool2024/application/
Colocated with #CCLS2024, June 13-14, 2024: https://jcls.io/site/conference/
-
CLS INFRA Training School Vienna 2024 June 10th–12th, 2024:
ExploreCor: "Using Programmable #Corpora in #Computational #Literary #Studies"
This intensive program covers some of the most important steps in the research cycle of CLS, focusing on “Programmable Corpora” – dynamic collections of literary texts manipulated programmatically.
Apply now! https://pretix.eu/CLSINFRA-trainingschool2024/application/
Colocated with #CCLS2024, June 13-14, 2024: https://jcls.io/site/conference/
-
@ZBW_MediaTalk Das Tagungsprogramm steht (und wird bald veröffentlicht), Bewerbungen für den folgenden Data Sprint sind noch möglich! #datasprint #twitter #dataScience #corpora
https://www.dnb.de/twitterdatasprint -
Today in History and Theory of #DigitalHumanities, @fabianmoss is talking about #corpora and #models!
-
Oliver Watteler and Ulrike Schneider are talking about "Can I publish my social media corpus" @ #cmc2023 #corpora #socialMedia #linguistics #gdpr
-
One of the prettiest (if not very practical) university locations in Germany. 👋 from #cmc2023 at the University of Mannheim! #corpora #linguistics #socialMedia
-
#corpora I bet we all have some anxiety about them. How do you choose what texts to look at? How do you know when you have enough? The #DataSittersClub is back, asking those questions and more to corpus linguist Shelley Staples, while exploring pizza and the Newbery Award for youth literature. #DigitalHumanities https://datasittersclub.github.io/site/dsc19.html
-
Is there any better news to wake up to than the fact that Norway has digitized All The Books and it's no problem at all to get all their Baby-Sitters Club translations? 🤩 #DigitalHumanities #DataSittersClub #corpora
-
Some hard facts from the British National Corpus #BNC : of the 44 hits for "oddness", 16 come from the same source, which turns out to be D. A. Cruse's "Lexical Semantics" textbook from 1986. Which makes me wonder, not for the first time, whether it has been a good idea to include linguistic texts in the BNC sampling :)
#EnglishLinguistics #corpora #metaLinguistics -
A survey of corpora for Germanic low-resource languages and dialects.
#NLProc #linguistics #corpora https://github.com/mainlp/germanic-lrl-corpora -
-
If you need to wrangle with EEBO-TCP for your text analysis project, consider using the EarlyPrint project corpus. They've done a bunch of preprocessing to transform "the early English print record, from 1473 to the early 1700s, into a linguistically annotated and deeply searchable text archive." Documentation and tutorials are all really thorough. https://earlyprint.org/about/ https://humanitiesdata.com/resources/436 #CulturalAnalytics #dh #opendata #corpora #eebo-tcp
-
From: Alexander Huber
Have you explored the new #EighteenthCenturyPoetryArchive corpus builder yet? It allows you to quickly create and share collections of poems, editions, or lists of authors with a single link!
https://www.eighteenthcenturypoetry.org/resources/corpusbuilder.shtml
#c18th #poetry #18thC #c18dh #ECPA #corpora #readinglists
From: @c18ah
https://hcommons.social/@c18ah/110022458784553805 -
Have you explored the new #EighteenthCenturyPoetryArchive corpus builder yet? It allows you to quickly create and share collections of poems, editions, or lists of authors with a single link!
https://www.eighteenthcenturypoetry.org/resources/corpusbuilder.shtml
-
How many copies of Matthias's vacation message do we all get before someone at ELRA figures out how to filter them?
-
Similarly, you may want a so-called #MeaningRepresentation, some structured representation of the information conveyed by a text or a linguistically motivated semantic representation of the text. These annotations are essential for my work in #NaturalLanguageGeneration / #NLG, and a big struggle for the community is coming up with ways to build #corpora for different tasks, domains, genres, or languages which also have the kinds of MRs our systems use.
-
Some hard facts from the British National Corpus #BNC : of the 44 hits for "oddness", 16 come from the same source, which turns out to be D. A. Cruse's "Lexical Semantics" textbook from 1986. Which makes me wonder, not for the first time, whether it has been a good idea to include linguistic texts in the BNC sampling :)
#EnglishLinguistics #corpora #metaLinguistics -
Some hard facts from the British National Corpus #BNC : of the 44 hits for "oddness", 16 come from the same source, which turns out to be D. A. Cruse's "Lexical Semantics" textbook from 1986. Which makes me wonder, not for the first time, whether it has been a good idea to include linguistic texts in the BNC sampling :)
#EnglishLinguistics #corpora #metaLinguistics -
Some hard facts from the British National Corpus #BNC : of the 44 hits for "oddness", 16 come from the same source, which turns out to be D. A. Cruse's "Lexical Semantics" textbook from 1986. Which makes me wonder, not for the first time, whether it has been a good idea to include linguistic texts in the BNC sampling :)
#EnglishLinguistics #corpora #metaLinguistics -
Some hard facts from the British National Corpus #BNC : of the 44 hits for "oddness", 16 come from the same source, which turns out to be D. A. Cruse's "Lexical Semantics" textbook from 1986. Which makes me wonder, not for the first time, whether it has been a good idea to include linguistic texts in the BNC sampling :)
#EnglishLinguistics #corpora #metaLinguistics -
Similarly, you may want a so-called #MeaningRepresentation, some structured representation of the information conveyed by a text or a linguistically motivated semantic representation of the text. These annotations are essential for my work in #NaturalLanguageGeneration / #NLG, and a big struggle for the community is coming up with ways to build #corpora for different tasks, domains, genres, or languages which also have the kinds of MRs our systems use.
-
Similarly, you may want a so-called #MeaningRepresentation, some structured representation of the information conveyed by a text or a linguistically motivated semantic representation of the text. These annotations are essential for my work in #NaturalLanguageGeneration / #NLG, and a big struggle for the community is coming up with ways to build #corpora for different tasks, domains, genres, or languages which also have the kinds of MRs our systems use.
-
From: Alexander Huber
Have you explored the new #EighteenthCenturyPoetryArchive corpus builder yet? It allows you to quickly create and share collections of poems, editions, or lists of authors with a single link!
https://www.eighteenthcenturypoetry.org/resources/corpusbuilder.shtml
#c18th #poetry #18thC #c18dh #ECPA #corpora #readinglists
From: @c18ah
https://hcommons.social/@c18ah/110022458784553805 -
Have you explored the new #EighteenthCenturyPoetryArchive corpus builder yet? It allows you to quickly create and share collections of poems, editions, or lists of authors with a single link!
https://www.eighteenthcenturypoetry.org/resources/corpusbuilder.shtml