home.social

#text-analysis — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #text-analysis, aggregated by home.social.

fetched live
  1. Published at #IRRJ: "Much Ado about Accessibility: An Exploration of Online Content Accessibility from an Autism-Informed Perspective" by Hrishita Chakrabarti and Maria Soledad Pera. #TextAnalysis, #Autism, #WebAccessibility, #SearchEngines, #LLMs

    doi.org/10.54195/irrj.25297

  2. Published at #IRRJ: "Much Ado about Accessibility: An Exploration of Online Content Accessibility from an Autism-Informed Perspective" by Hrishita Chakrabarti and Maria Soledad Pera. #TextAnalysis, #Autism, #WebAccessibility, #SearchEngines, #LLMs

    doi.org/10.54195/irrj.25297

  3. Poking around with sentiment analysis on the public domain copy of Pride and Prejudice by Jane Austen.

    I extracted the speech, did a strict attribution, and ran sentiment analysis for different speakers based off chunks sampled from the text.

    Elizabeth is neutral with a 28% confidence level, Jane is joyful at a 57% confidence. Darcy is sad with 94% confidence and Mrs Bennet is joyful at 95% confidence.

    Those aren't the emotions I get from reading the text. Again, I'm learning more about the sentiment analysis than the text.
    kaggle.com/code/alisonhawke/pr

    #DataScience #Python #Literature #TextAnalysis #SentimentAnalysis

  4. Poking around with sentiment analysis on the public domain copy of Pride and Prejudice by Jane Austen.

    I extracted the speech, did a strict attribution, and ran sentiment analysis for different speakers based off chunks sampled from the text.

    Elizabeth is neutral with a 28% confidence level, Jane is joyful at a 57% confidence. Darcy is sad with 94% confidence and Mrs Bennet is joyful at 95% confidence.

    Those aren't the emotions I get from reading the text. Again, I'm learning more about the sentiment analysis than the text.
    kaggle.com/code/alisonhawke/pr

    #DataScience #Python #Literature #TextAnalysis #SentimentAnalysis

  5. Spent some time doing data analysis on the Project Gutenberg text of Pride and Prejudice.

    Pulling out all the speech, the library I used said it was "emotionally neutral" in sentiment. Which is interesting because when you read it, the speech is absolutely not that. There's a lot in the subtleties of the speech that makes it very pointed.

    The confidence on the emotional rating was 57%, which seems low to me. Doing analysis on a book I'm familiar with and recently read is telling me as much about the means of evaluating the text as the text itself.
    #DataScience #TextAnalysis #SentimentAnalysis

  6. Spent some time doing data analysis on the Project Gutenberg text of Pride and Prejudice.

    Pulling out all the speech, the library I used said it was "emotionally neutral" in sentiment. Which is interesting because when you read it, the speech is absolutely not that. There's a lot in the subtleties of the speech that makes it very pointed.

    The confidence on the emotional rating was 57%, which seems low to me. Doing analysis on a book I'm familiar with and recently read is telling me as much about the means of evaluating the text as the text itself.
    #DataScience #TextAnalysis #SentimentAnalysis

  7. Why do politicians always talk about "middle class," "immigrants," or "families"?

    New research funded by @fwf and @dfg_public led by Dr. Lena Maria Huber (lenamariahuber.eu/, MZES, University of Mannheim) and Dr. Hauke Licht (University of Innsbruck), explores how politicians talk about social groups in campaign platforms and parliamentary speeches across 8 Western European countries.

    🔗haukelicht.github.io/projects/

    #PoliticalCommunication #ComputationalSocialScience #Democracy #TextAnalysis

  8. Why do politicians always talk about "middle class," "immigrants," or "families"?

    New research funded by @fwf and @dfg_public led by Dr. Lena Maria Huber (lenamariahuber.eu/, MZES, University of Mannheim) and Dr. Hauke Licht (University of Innsbruck), explores how politicians talk about social groups in campaign platforms and parliamentary speeches across 8 Western European countries.

    🔗haukelicht.github.io/projects/

    #PoliticalCommunication #ComputationalSocialScience #Democracy #TextAnalysis

  9. Can #AI reasoning models infer people's underlying reasons in unstructured chat data from group decisions?

    Across multiple prompting steps, #GTP5 usually did NOT select the same underlying reason as a human rater: doi.org/10.48550/arXiv.2601.05

    #AI #cogSci #textAnalysis #psychometrics

  10. Can #AI reasoning models infer people's underlying reasons in unstructured chat data from group decisions?

    Across multiple prompting steps, #GTP5 usually did NOT select the same underlying reason as a human rater: doi.org/10.48550/arXiv.2601.05

    #AI #cogSci #textAnalysis #psychometrics

  11. Ive been digging around for text analysis OS apps and found AntConc via a Reddit thread. This app is very good from what I can see in early quick testing. Im looking at term frequency across relevant papers, and some 'concordance' context but AntConc will do a lot more. Together with Taguette you have all you need for a lot of analysis.

    Im running portable on Windows but Mac and Linux also work.
    laurenceanthony.net/software/a

    #AntConc #textanalysis #research #academia #academicchatter #linguistics

  12. Ive been digging around for text analysis OS apps and found AntConc via a Reddit thread. This app is very good from what I can see in early quick testing. Im looking at term frequency across relevant papers, and some 'concordance' context but AntConc will do a lot more. Together with Taguette you have all you need for a lot of analysis.

    Im running portable on Windows but Mac and Linux also work.
    laurenceanthony.net/software/a

    #AntConc #textanalysis #research #academia #academicchatter #linguistics

  13. Recs for text analysis tools, without any or only minimal genai - Taguette, QDA Miner, what else? Bulk document (around 50 papers) common word analysis is what Im mainly looking for, as well as individual document labelling. Open source, free, Windows 10.
    #QualitativeData #textanalysis #software #research #academia #academicchatter #opensource

  14. Recs for text analysis tools, without any or only minimal genai - Taguette, QDA Miner, what else? Bulk document (around 50 papers) common word analysis is what Im mainly looking for, as well as individual document labelling. Open source, free, Windows 10.
    #QualitativeData #textanalysis #software #research #academia #academicchatter #opensource

  15. Charting Twain: Building a Character Interaction Graph with Quarkus, OpenNLP, and a local Ollama Model. Uncover hidden dynamics in Huckleberry Finn using Java, sentiment analysis, and modern NLP.
    myfear.substack.com/p/text-ana
    #Java #Quarkus #OpenLNP #TextAnalysis

  16. Charting Twain: Building a Character Interaction Graph with Quarkus, OpenNLP, and a local Ollama Model. Uncover hidden dynamics in Huckleberry Finn using Java, sentiment analysis, and modern NLP.
    myfear.substack.com/p/text-ana
    #Java #Quarkus #OpenLNP #TextAnalysis

  17. Ah, the groundbreaking revelation that #LLMs don't handle more words as well as they handle fewer. 🤯 Who knew that feeding a massive text blob would confuse a glorified autocomplete? 😂 Next week: water is wet! 🌊
    research.trychroma.com/context #textanalysis #AIhumor #technews #revelations #HackerNews #ngated

  18. Ah, the groundbreaking revelation that #LLMs don't handle more words as well as they handle fewer. 🤯 Who knew that feeding a massive text blob would confuse a glorified autocomplete? 😂 Next week: water is wet! 🌊
    research.trychroma.com/context #textanalysis #AIhumor #technews #revelations #HackerNews #ngated

  19. Wow! #QualiService could be a great resource!

    It wasn't obvious to me how to find the transcripts for these doctor-patient interaction data from 4 countries, but if such transcripts are accessible, that's GREAT!

    qualiservice.org/en/qsearch.ht

    #medicine #openData #cogSci #TextAnalysis

  20. Wow! #QualiService could be a great resource!

    It wasn't obvious to me how to find the transcripts for these doctor-patient interaction data from 4 countries, but if such transcripts are accessible, that's GREAT!

    qualiservice.org/en/qsearch.ht

    #medicine #openData #cogSci #TextAnalysis

  21. 🇪🇺 Want to analyze text from the EU public consultations? EU public consultations are a way in which the EU invites the broader public to publicly comment on upcoming legislation.

    📦 :python: I just published a first version of a Python package {eu-consultations} to scrape and extract text from the EU website:
    github.com/marioangst/eu_consu

    - download consultation data as displayed on the EU's frontend into a validated form
    - download associated files (this is the hard part about analysing this data - lots of feedback is in .docx and .pdf files)
    - extract text from the files using docling and attach to feedback

    You get all data in validated form and possibly stored in huge (sorry for that) JSON files ;).

    This package is part of an analysis project on feedback the EU has received via the public consultation process on digital policy we plan to present later this year, but I thought let's make some of the tools we use open source way earlier already.

    #python #textanalysis #policyanalysis #CompSocSci

  22. 🇪🇺 Want to analyze text from the EU public consultations? EU public consultations are a way in which the EU invites the broader public to publicly comment on upcoming legislation.

    📦 :python: I just published a first version of a Python package {eu-consultations} to scrape and extract text from the EU website:
    github.com/marioangst/eu_consu

    - download consultation data as displayed on the EU's frontend into a validated form
    - download associated files (this is the hard part about analysing this data - lots of feedback is in .docx and .pdf files)
    - extract text from the files using docling and attach to feedback

    You get all data in validated form and possibly stored in huge (sorry for that) JSON files ;).

    This package is part of an analysis project on feedback the EU has received via the public consultation process on digital policy we plan to present later this year, but I thought let's make some of the tools we use open source way earlier already.

    #python #textanalysis #policyanalysis #CompSocSci

  23. Like we found in “Your Health vs. My Liberty” (doi.org/10.1016/j.cognition.20) Yael Rozenblum et al. found that compliance with #publicHealth guidance correlated with indicators of the perceived threat of a viral pandemic.

    Also, relying on #misinformation correlated with reliance on simple (vs. complex) #reasoning.

    The free paper: doi.org/10.1002/tea.21975

    #medicine #health #education #psychology #epistemology #logic #textAnalysis

  24. Like we found in “Your Health vs. My Liberty” (doi.org/10.1016/j.cognition.20) Yael Rozenblum et al. found that compliance with #publicHealth guidance correlated with indicators of the perceived threat of a viral pandemic.

    Also, relying on #misinformation correlated with reliance on simple (vs. complex) #reasoning.

    The free paper: doi.org/10.1002/tea.21975

    #medicine #health #education #psychology #epistemology #logic #textAnalysis

  25. Have you ever wanted to use a #LLM as one step in a workflow?

    We integrated #GPT into the open-source analysis platform #useGalaxy, where you can link GPT to several thousand other tools, add more attachments for analysis and make your research reproducible.

    galaxyproject.org/news/2024-09

    In our example, we uploaded an audio file and used #Whisper to convert it into text, cut out the moderation, and prompted chatGPT to translate it into German.

    #DH #textanalysis #tools
    @galaxyfreiburg

  26. Have you ever wanted to use a #LLM as one step in a workflow?

    We integrated #GPT into the open-source analysis platform #useGalaxy, where you can link GPT to several thousand other tools, add more attachments for analysis and make your research reproducible.

    galaxyproject.org/news/2024-09

    In our example, we uploaded an audio file and used #Whisper to convert it into text, cut out the moderation, and prompted chatGPT to translate it into German.

    #DH #textanalysis #tools
    @galaxyfreiburg

  27. 📚🇮🇹 New working paper: "Evaluating Embedding Models for Clustering Italian Political News"

    This study compares embedding models for unsupervised clustering of Italian political news shared on Facebook before the 2018 and 2022 elections, aiming to advance NLP methods for political text analysis in non-English languages.

    Paper: osf.io/preprints/osf/2j9ed

    Code & data: github.com/fabiogiglietto/Sema

    Feedback welcome!

    #NLP #PoliticalScience #TextAnalysis #MachineLearning

  28. 📚🇮🇹 New working paper: "Evaluating Embedding Models for Clustering Italian Political News"

    This study compares embedding models for unsupervised clustering of Italian political news shared on Facebook before the 2018 and 2022 elections, aiming to advance NLP methods for political text analysis in non-English languages.

    Paper: osf.io/preprints/osf/2j9ed

    Code & data: github.com/fabiogiglietto/Sema

    Feedback welcome!

    #NLP #PoliticalScience #TextAnalysis #MachineLearning

  29. Word co-occurrence matrix/heatmap

    How to compute and visualize the correlation between terms that occur together in a list of documents*

    *documents: keywords, page titles, product names/descriptions, social media posts, etc.

    bit.ly/3Z4tiTx

    #DataVisualization #textanalysis #DataScience #Python

  30. The Digital Humanities Team at the University of Vienna and the Ottoman Nature in Travelogues (ONiT) project are hosting a #hackathon focused on analyzing texts, images, and multimodal sources.

    Thursday, November 14, 9:00 CET to Friday, November 15, 15:00 CET
    dh.univie.ac.at/hackathon/
    #DigitalHumanities #ComputationalHumanities #TextAnalysis #ImageAnalysis

  31. The Digital Humanities Team at the University of Vienna and the Ottoman Nature in Travelogues (ONiT) project are hosting a #hackathon focused on analyzing texts, images, and multimodal sources.

    Thursday, November 14, 9:00 CET to Friday, November 15, 15:00 CET
    dh.univie.ac.at/hackathon/
    #DigitalHumanities #ComputationalHumanities #TextAnalysis #ImageAnalysis

  32. It was also a methodologically fun paper, combining digitized archival text, Census & survey data, NLP, and panel models.

    Email or dm me for a copy! #sociology #textanalysis #rstats

    3/3

  33. It was also a methodologically fun paper, combining digitized archival text, Census & survey data, NLP, and panel models.

    Email or dm me for a copy! #sociology #textanalysis #rstats

    3/3

  34. 📣 Attention Linguistics & Digital Humanities students! 🎓📚
    Join @janispagel and me for the »Prompting, Evaluation, Interpretation: An Introduction to LLMs in Text Analysis« course at the upcoming Deep Learning for Language Analysis Summer School in Cologne: ml-school.uni-koeln.de! 📝🔍
    🗓️ Don't miss out – registration is open until June 16th! 🙌
    #LLMs #TextAnalysis #NLP #AI #Linguistics #DigitalHumanities #CRETA

  35. Want to learn more about how to use regular expressions in R?

    Come join us to learn how to use regular expressions to parse and clean text data on Thursday, June 6th, 5-6pm Eastern Time!

    Find the Zoom registration details on our website:

    rug-at-hdsi.org/upcoming_event

    #rstats #DataScience #regex #TextAnalysis

  36. Bias estimation in word embeddings using a Bayesian approach instead of WEAT or MAC. A new paper in Computational Linguistics.

    #ComputationalSocialSciences #textanalysis #NLP

  37. How would you go about creating a filter that blocks posts about things that people hate?

    I've thought I could build a text classifier, but it could be hard to train since I'd need to guess whether or not the author hates the thing they are posting about.

    I wouldn't want it to become a filter for all current events news, but I suspect that's what it would become.

    #fediverse #mastodon #machineLearning #tfidf #classification #socialMedia #classifier #textAnalysis #programming #tech #technology

  38. 🤖 Generator-Guided Crowd Reaction Assessment

    (... I was really fascinated with this paper because my YOShInOn RSS reader has a module like this which can predict the popularity of a story on HN and if people will have a big discussion about it; it is super-easy to gather data for this kind of model)

    arxiv.org/abs/2403.09702

    #cs #research #ai #ml #textanalysis

  39. For those interested in Topic Modeling and #textanalysis #textmining, this free article discusses some forms of validating the results.

    "Topic Model Validation Methods and their Impact
    on Model Selection and Evaluation". COMPUTATIONAL COMMUNICATION RESEARCH 5.1 (2023) 1–26

    #ComputationalSocialScience

    aup-online.com/content/journal