#ngram — Public Fediverse posts on home.social

Cyclone @[email protected] · 2026-05-22 · 16:56 UTC

Spider v1.0.0 released.

Spider is not just another web crawler -- it is a purpose-built wordlist and ngram processor for hash cracking workflows.

URL Mode:
Point it at a URL and Spider crawls the target, extracts words, and generates frequency-sorted wordlists and/or ngrams.

But, Spider does not stop at web crawling...

File Mode:
Feed it local files and it brings the same word-processing engine to your own datasets, scraped content, notes, dumps, configs, or any other plaintext source you want to turn into a targeted wordlist or ngram set.

More info:
https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #generator #sort #ngram #cyclone #hashpwn #hashcracking

#spider #webcrawler #wordlist #generator #sort #ngram

Cyclone @[email protected] · 2026-05-22 · 16:56 UTC

Spider v1.0.0 released.

Spider is not just another web crawler -- it is a purpose-built wordlist and ngram processor for hash cracking workflows.

URL Mode:
Point it at a URL and Spider crawls the target, extracts words, and generates frequency-sorted wordlists and/or ngrams.

But, Spider does not stop at web crawling...

File Mode:
Feed it local files and it brings the same word-processing engine to your own datasets, scraped content, notes, dumps, configs, or any other plaintext source you want to turn into a targeted wordlist or ngram set.

More info:
https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #generator #sort #ngram #cyclone #hashpwn #hashcracking

#spider #webcrawler #wordlist #generator #sort #ngram

Cyclone @[email protected] · 2026-05-22 · 16:56 UTC

Spider v1.0.0 released.

Spider is not just another web crawler -- it is a purpose-built wordlist and ngram processor for hash cracking workflows.

URL Mode:
Point it at a URL and Spider crawls the target, extracts words, and generates frequency-sorted wordlists and/or ngrams.

But, Spider does not stop at web crawling...

File Mode:
Feed it local files and it brings the same word-processing engine to your own datasets, scraped content, notes, dumps, configs, or any other plaintext source you want to turn into a targeted wordlist or ngram set.

More info:
https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #generator #sort #ngram #cyclone #hashpwn #hashcracking

#spider #webcrawler #wordlist #generator #sort #ngram

Cyclone @[email protected] · 2026-05-22 · 16:56 UTC

Spider v1.0.0 released.

Spider is not just another web crawler -- it is a purpose-built wordlist and ngram processor for hash cracking workflows.

URL Mode:
Point it at a URL and Spider crawls the target, extracts words, and generates frequency-sorted wordlists and/or ngrams.

But, Spider does not stop at web crawling...

File Mode:
Feed it local files and it brings the same word-processing engine to your own datasets, scraped content, notes, dumps, configs, or any other plaintext source you want to turn into a targeted wordlist or ngram set.

More info:
https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #generator #sort #ngram #cyclone #hashpwn #hashcracking

#hashcracking #hashpwn #cyclone #ngram #sort #generator

Cyclone @[email protected] · 2026-05-22 · 16:56 UTC

Spider v1.0.0 released.

Spider is not just another web crawler -- it is a purpose-built wordlist and ngram processor for hash cracking workflows.

URL Mode:
Point it at a URL and Spider crawls the target, extracts words, and generates frequency-sorted wordlists and/or ngrams.

But, Spider does not stop at web crawling...

File Mode:
Feed it local files and it brings the same word-processing engine to your own datasets, scraped content, notes, dumps, configs, or any other plaintext source you want to turn into a targeted wordlist or ngram set.

More info:
https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #generator #sort #ngram #cyclone #hashpwn #hashcracking

#spider #webcrawler #wordlist #generator #sort #ngram

Dr Ian McCormick @[email protected] · 2026-05-14 · 13:24 UTC

Decline and fall of NOTWITHSTANDING (preposition, conjunction, adverb) #language #English #style #composition #linguistics #edchat #discourse #connectives #ngram

#language #english #style #composition #linguistics #edchat

Dr Ian McCormick @[email protected] · 2026-05-14 · 13:24 UTC

Decline and fall of NOTWITHSTANDING (preposition, conjunction, adverb) #language #English #style #composition #linguistics #edchat #discourse #connectives #ngram

#language #english #style #composition #linguistics #edchat

Dr Ian McCormick @[email protected] · 2026-05-14 · 13:24 UTC

Decline and fall of NOTWITHSTANDING (preposition, conjunction, adverb) #language #English #style #composition #linguistics #edchat #discourse #connectives #ngram

#language #english #style #composition #linguistics #edchat

Dr Ian McCormick @[email protected] · 2026-05-14 · 13:24 UTC

Decline and fall of NOTWITHSTANDING (preposition, conjunction, adverb) #language #English #style #composition #linguistics #edchat #discourse #connectives #ngram

#ngram #connectives #discourse #edchat #linguistics #composition

Dr Ian McCormick @[email protected] · 2026-05-14 · 13:24 UTC

Decline and fall of NOTWITHSTANDING (preposition, conjunction, adverb) #language #English #style #composition #linguistics #edchat #discourse #connectives #ngram

#language #english #style #composition #linguistics #edchat

Kritische Masse @[email protected] · 2026-03-01 · 11:52 UTC

BILD-Bürgerstreiche & das Ende der Spaßgesellschaft

Diebstahl lohnt sich manchmal doch. Ist aber sonst ziemlich verboten. bild zeitung Ich weiß nicht mehr, wo ich das mal unterwegs gesehen und aufgenommen habe. Und ich frage mich, was für ein elend, dass heutzutage Menschen gezwungen sind, eine BILD am Sonntag stehlen zu müssen, es sei denn, … Das Ende der Spaßgesellschaft haben wir bereits von annähernd 20 Jahren erlebt. Ich habe mal den Ngram-Viewer von Google nach Fundstellen in Büchern und Zeitschriften befragt. Zugleich mit […]

https://www.kritische-masse.de/logbuch/2026/03/bild-buergerstreiche-das-ende-der-spassgesellschaft/

#youtube #spaßgesellschaft #krieg #frieden #facebook #diktatur

Dr Ian McCormick @[email protected] · 2026-02-20 · 10:21 UTC

Due to its rising popularity in formal use in many Indian documents, "erstwhile" is less archaic than you might have supposed? #archaic #language #ngram

#archaic #language #ngram

Tykayn @[email protected] · 2025-09-29 · 21:33 UTC

les exercices sur #ngram sont chouettes aussi pour apprendre l'#ergol mais seulement je fois qu'on a bien potassé la mémoire mécanique ailleurs.

un an après avoir commencé à être en mesure de faire des phrases en dactylo en ergol, j'ai toujours du mal sur les enchaînements d'annuaire et auriculaire, mais que de la main droite.

probablement une histoire d'appui de longue date concernant une habitude de dessin où ces doigts me servaient surtout de support pour préciser mon trait.

#ngram #ergol

Tykayn @[email protected] · 2025-09-29 · 21:33 UTC

les exercices sur #ngram sont chouettes aussi pour apprendre l'#ergol mais seulement je fois qu'on a bien potassé la mémoire mécanique ailleurs.

un an après avoir commencé à être en mesure de faire des phrases en dactylo en ergol, j'ai toujours du mal sur les enchaînements d'annuaire et auriculaire, mais que de la main droite.

probablement une histoire d'appui de longue date concernant une habitude de dessin où ces doigts me servaient surtout de support pour préciser mon trait.

#ngram #ergol

Tykayn @[email protected] · 2025-09-29 · 21:33 UTC

les exercices sur #ngram sont chouettes aussi pour apprendre l'#ergol mais seulement je fois qu'on a bien potassé la mémoire mécanique ailleurs.

un an après avoir commencé à être en mesure de faire des phrases en dactylo en ergol, j'ai toujours du mal sur les enchaînements d'annuaire et auriculaire, mais que de la main droite.

probablement une histoire d'appui de longue date concernant une habitude de dessin où ces doigts me servaient surtout de support pour préciser mon trait.

#ngram #ergol

Tykayn @[email protected] · 2025-09-29 · 21:33 UTC

les exercices sur #ngram sont chouettes aussi pour apprendre l'#ergol mais seulement je fois qu'on a bien potassé la mémoire mécanique ailleurs.

un an après avoir commencé à être en mesure de faire des phrases en dactylo en ergol, j'ai toujours du mal sur les enchaînements d'annuaire et auriculaire, mais que de la main droite.

probablement une histoire d'appui de longue date concernant une habitude de dessin où ces doigts me servaient surtout de support pour préciser mon trait.

#ergol #ngram

Tykayn @[email protected] · 2025-09-29 · 21:33 UTC

les exercices sur #ngram sont chouettes aussi pour apprendre l'#ergol mais seulement je fois qu'on a bien potassé la mémoire mécanique ailleurs.

un an après avoir commencé à être en mesure de faire des phrases en dactylo en ergol, j'ai toujours du mal sur les enchaînements d'annuaire et auriculaire, mais que de la main droite.

probablement une histoire d'appui de longue date concernant une habitude de dessin où ces doigts me servaient surtout de support pour préciser mon trait.

#ngram #ergol

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#thingumbrella #text #analysis #cluster #kmeans #tfidf

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#thingumbrella #text #analysis #cluster #kmeans #tfidf

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#thingumbrella #text #analysis #cluster #kmeans #tfidf

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#javascript #typescript #vector #ngram #tfidf #kmeans

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#thingumbrella #text #analysis #cluster #kmeans #tfidf

Cyclone @[email protected] · 2025-05-13 · 14:00 UTC

Spider v0.9.0 released:

Updates:
-url-match flag to filter URLs by keyword
Several small bug fixes
Go bumped to v1.24.3

https://forum.hashpwn.net/post/606

#infosec #spider #urlcrawl #hashpwn #wordlist #ngram

Cyclone @[email protected] · 2025-05-13 · 14:00 UTC

Spider v0.9.0 released:

Updates:
-url-match flag to filter URLs by keyword
Several small bug fixes
Go bumped to v1.24.3

https://forum.hashpwn.net/post/606

#infosec #spider #urlcrawl #hashpwn #wordlist #ngram

Cyclone @[email protected] · 2025-05-13 · 14:00 UTC

Spider v0.9.0 released:

Updates:
-url-match flag to filter URLs by keyword
Several small bug fixes
Go bumped to v1.24.3

https://forum.hashpwn.net/post/606

#infosec #spider #urlcrawl #hashpwn #wordlist #ngram

Cyclone @[email protected] · 2025-05-13 · 14:00 UTC

Spider v0.9.0 released:

Updates:
-url-match flag to filter URLs by keyword
Several small bug fixes
Go bumped to v1.24.3

https://forum.hashpwn.net/post/606

#infosec #spider #urlcrawl #hashpwn #wordlist #ngram

#ngram #wordlist #hashpwn #urlcrawl #spider #infosec

petersuber @[email protected] · 2025-04-25 · 18:32 UTC

Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?

Both are attested. But FYI, #ngram says that "advance notice" is much more common, even if it's in decline.
https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&year_start=1800&year_end=2022&corpus=en&smoothing=3

#ngram

petersuber @[email protected] · 2025-04-25 · 18:32 UTC

Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?

Both are attested. But FYI, #ngram says that "advance notice" is much more common, even if it's in decline.
https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&year_start=1800&year_end=2022&corpus=en&smoothing=3

#ngram

petersuber @[email protected] · 2025-04-25 · 18:32 UTC

Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?

Both are attested. But FYI, #ngram says that "advance notice" is much more common, even if it's in decline.
https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&year_start=1800&year_end=2022&corpus=en&smoothing=3

#ngram

petersuber @[email protected] · 2025-04-25 · 18:32 UTC

Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?

Both are attested. But FYI, #ngram says that "advance notice" is much more common, even if it's in decline.
https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&year_start=1800&year_end=2022&corpus=en&smoothing=3

#ngram

petersuber @[email protected] · 2025-04-25 · 18:32 UTC

Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?

Both are attested. But FYI, #ngram says that "advance notice" is much more common, even if it's in decline.
https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&year_start=1800&year_end=2022&corpus=en&smoothing=3

#ngram

Cyclone @[email protected] · 2025-04-17 · 17:27 UTC

🚀 Spider v0.8.0

New features include:

"-file" to generate n-grams from local plaintext files

"-timeout" for URL crawling

"-sort" to output n-grams by frequency

https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #ngram #infosec #hashcracking #golang #hashpwn

#spider #webcrawler #wordlist #ngram #infosec #hashcracking

Cyclone @[email protected] · 2025-04-17 · 17:27 UTC

🚀 Spider v0.8.0

New features include:

"-file" to generate n-grams from local plaintext files

"-timeout" for URL crawling

"-sort" to output n-grams by frequency

https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #ngram #infosec #hashcracking #golang #hashpwn

#spider #webcrawler #wordlist #ngram #infosec #hashcracking

Cyclone @[email protected] · 2025-04-17 · 17:27 UTC

🚀 Spider v0.8.0

New features include:

"-file" to generate n-grams from local plaintext files

"-timeout" for URL crawling

"-sort" to output n-grams by frequency

https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #ngram #infosec #hashcracking #golang #hashpwn

#spider #webcrawler #wordlist #ngram #infosec #hashcracking

Cyclone @[email protected] · 2025-04-17 · 17:27 UTC

🚀 Spider v0.8.0

New features include:

"-file" to generate n-grams from local plaintext files

"-timeout" for URL crawling

"-sort" to output n-grams by frequency

https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #ngram #infosec #hashcracking #golang #hashpwn

#hashpwn #golang #hashcracking #infosec #ngram #wordlist

Habr @[email protected] · 2025-03-31 · 12:32 UTC

Слушать некогда читать: где поставим запятую?

Узнаете, когда заглянете под кат.😉 Для затравочки: речь пойдёт про инструмент ЮMoney для транскрибации аудио с внутренних созвонов в тексты и про кое-что ещё для наших клиентов. 😎👇

https://habr.com/ru/companies/yoomoney/articles/896096/

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация #идентификация #транскрибация_звонков #ngram

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация

Habr @[email protected] · 2025-03-31 · 12:32 UTC

Слушать некогда читать: где поставим запятую?

Узнаете, когда заглянете под кат.😉 Для затравочки: речь пойдёт про инструмент ЮMoney для транскрибации аудио с внутренних созвонов в тексты и про кое-что ещё для наших клиентов. 😎👇

https://habr.com/ru/companies/yoomoney/articles/896096/

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация #идентификация #транскрибация_звонков #ngram

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация

Habr @[email protected] · 2025-03-31 · 12:32 UTC

Слушать некогда читать: где поставим запятую?

Узнаете, когда заглянете под кат.😉 Для затравочки: речь пойдёт про инструмент ЮMoney для транскрибации аудио с внутренних созвонов в тексты и про кое-что ещё для наших клиентов. 😎👇

https://habr.com/ru/companies/yoomoney/articles/896096/

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация #идентификация #транскрибация_звонков #ngram

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация

Habr @[email protected] · 2025-03-31 · 12:32 UTC

Слушать некогда читать: где поставим запятую?

Узнаете, когда заглянете под кат.😉 Для затравочки: речь пойдёт про инструмент ЮMoney для транскрибации аудио с внутренних созвонов в тексты и про кое-что ещё для наших клиентов. 😎👇

https://habr.com/ru/companies/yoomoney/articles/896096/

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация #идентификация #транскрибация_звонков #ngram

#ngram #транскрибация_звонков #идентификация #диаризация #саммаризация #ai

François Renaville 🇺🇦🇪🇺 @[email protected] · 2024-04-07 · 06:35 UTC

#Google Books Is Indexing #AI-Generated Books

👉 #GoogleBooks is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google #Ngram viewer, an important tool used by researchers to track #language use throughout history.

https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms

#GoogleNgram #NgramViewer #linguistics #diachrony #diachroniclinguistics #research #languages #aigeneratedcontent #AIgeneratedBooks

#google #ai #googlebooks #ngram #language #googlengram

François Renaville 🇺🇦🇪🇺 @[email protected] · 2024-04-07 · 06:35 UTC

#Google Books Is Indexing #AI-Generated Books

👉 #GoogleBooks is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google #Ngram viewer, an important tool used by researchers to track #language use throughout history.

https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms

#GoogleNgram #NgramViewer #linguistics #diachrony #diachroniclinguistics #research #languages #aigeneratedcontent #AIgeneratedBooks

#google #ai #googlebooks #ngram #language #googlengram

François Renaville 🇺🇦🇪🇺 @[email protected] · 2024-04-07 · 06:35 UTC

#Google Books Is Indexing #AI-Generated Books

👉 #GoogleBooks is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google #Ngram viewer, an important tool used by researchers to track #language use throughout history.

https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms

#GoogleNgram #NgramViewer #linguistics #diachrony #diachroniclinguistics #research #languages #aigeneratedcontent #AIgeneratedBooks

#google #ai #googlebooks #ngram #language #googlengram

François Renaville 🇺🇦🇪🇺 @[email protected] · 2024-04-07 · 06:35 UTC

#Google Books Is Indexing #AI-Generated Books

👉 #GoogleBooks is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google #Ngram viewer, an important tool used by researchers to track #language use throughout history.

https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms

#GoogleNgram #NgramViewer #linguistics #diachrony #diachroniclinguistics #research #languages #aigeneratedcontent #AIgeneratedBooks

#aigeneratedbooks #aigeneratedcontent #languages #research #diachroniclinguistics #diachrony

François Renaville 🇺🇦🇪🇺 @[email protected] · 2024-04-07 · 06:35 UTC

#Google Books Is Indexing #AI-Generated Books

👉 #GoogleBooks is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google #Ngram viewer, an important tool used by researchers to track #language use throughout history.

https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms

#GoogleNgram #NgramViewer #linguistics #diachrony #diachroniclinguistics #research #languages #aigeneratedcontent #AIgeneratedBooks

#google #ai #googlebooks #ngram #language #googlengram

Tobias Zeumer @[email protected] · 2024-04-06 · 15:36 UTC

Google Books reportedly indexing bad AI-written works https://www.theverge.com/2024/4/5/24122077/google-books-ai-indexing-ngram

#google #googleBooks #ngram #LLMs #ChatGPT

#google #googlebooks #ngram #llms #chatgpt

Tobias Zeumer @[email protected] · 2024-04-06 · 15:36 UTC

Google Books reportedly indexing bad AI-written works https://www.theverge.com/2024/4/5/24122077/google-books-ai-indexing-ngram

#google #googleBooks #ngram #LLMs #ChatGPT

#google #googlebooks #ngram #llms #chatgpt

Tobias Zeumer @[email protected] · 2024-04-06 · 15:36 UTC

Google Books reportedly indexing bad AI-written works https://www.theverge.com/2024/4/5/24122077/google-books-ai-indexing-ngram

#google #googleBooks #ngram #LLMs #ChatGPT

#google #googlebooks #ngram #llms #chatgpt

Tobias Zeumer @[email protected] · 2024-04-06 · 15:36 UTC

Google Books reportedly indexing bad AI-written works https://www.theverge.com/2024/4/5/24122077/google-books-ai-indexing-ngram

#google #googleBooks #ngram #LLMs #ChatGPT

#chatgpt #llms #ngram #googlebooks #google

Tobias Zeumer @[email protected] · 2024-04-06 · 15:36 UTC

Google Books reportedly indexing bad AI-written works https://www.theverge.com/2024/4/5/24122077/google-books-ai-indexing-ngram

#google #googleBooks #ngram #LLMs #ChatGPT

#google #googlebooks #ngram #llms #chatgpt

Karsten Schmidt @[email protected] · 2023-11-03 · 14:25 UTC

#HowToThing #030 — Procedural, rule-based & stochastic text generation using a custom DSL, parse grammar (via https://thi.ng/parse) and abstract syntax tree transformation (via https://thi.ng/defmulti).

Since it's #NaNoWriMo & #NaNoGenMo [1], I'm closing out this first season of 30 #HowToThing's with a related topic & maybe someone even finds it useful/interesting... 😉🤷‍♂️

This example is in principle inspired by @galaxykate's oldie & goodie #Tracery, but is using a super simple custom text format instead of JSON to define variables and template text. Variables are expanded recursively and I've also added features like dynamic, indirect pointer-like variable lookups to derive variables based on current values (useful for conditionals & context-specific expansions), hidden assignments, chainable modifiers... I've included 5 different "story" templates (incl. comments) showing various features. Just press "regenerate" to create new random variations...

Similar to the previous #HowToThing, I'm hoping this example also shows that approaching use cases like this via small domain-specific languages with proper grammar rules, does not require much ceremony and is often more amenable to change during prototyping (and later also more maintainable!) than just regex bashing approaches...

The parser grammar itself is explained in the https://thi.ng/parse readme. As usual, the grammar was created/prototyped with the Parser Playground[2], which we developed from scratch during the first thi.ng livestream[3] (2.5h video)...

Demo (example project #145):
https://demo.thi.ng/umbrella/procedural-text/

Source code:
https://github.com/thi-ng/umbrella/tree/develop/examples/procedural-text/src

If you have any questions about this topic or the packages used here, please reply in thread or use the discussion forum (or issue tracker):

https://github.com/thi-ng/umbrella/discussions

[1] https://github.com/NaNoGenMo/2023/
[2] https://demo.thi.ng/umbrella/parse-playground/
[3] https://www.youtube.com/watch?v=mXp92s_VP40

#ThingUmbrella #NaNoWriMo2023 #NaNoGenMo2023 #ProcGen #Generative #TextGeneration #Ngram #TypeScript #JavaScript #Tutorial