#kmeans — Public Fediverse posts on home.social

Habr @[email protected] · 2026-07-08 · 13:52 UTC

От текста к смыслу: Embeddings, GPT и многомерные векторы в конкурентном анализе мобильных приложений

Отзывы пользователей — один из самых ценных источников информации о продукте, при этом часто клиенты описывают одну и ту же тему или проблему десятками разных слов. Раньше работать с фидбэком было долго и ресурсоемко, но с появлением Embeddings и LLM это изменилось.

https://habr.com/ru/companies/garage8/articles/1057100/

#Embeddings #GPT #LLM #OpenAI_Embeddings_API #анализ_отзывов #кластеризация_отзывов #KMeans #Agglomerative_Clustering #тональность_отзывов #voice_of_customer

#voice_of_customer #тональность_отзывов #agglomerative_clustering #kmeans #кластеризация_отзывов #анализ_отзывов

Habr @[email protected] · 2026-06-29 · 13:22 UTC

От PDF к учебному модулю: практичный ML-пайплайн внутри LMS

Всем привет, с вами Михаил Киселев, ML-разработчик в компании WebRise. И сегодня поговорим о практическом применении ML в образовании. Почему при горе регламентов, инструкций и методичек запуск нового курса всё равно растягивается на недели? И почему проблема часто не в LMS, а на шаг раньше — там, где знания в компании уже есть, а учебной структуры ещё нет?

https://habr.com/ru/articles/1053450/

#ml #lms #tfidf #kmeans #python #дистанционное_образование #дистанционное_обучение

#дистанционное_обучение #дистанционное_образование #python #kmeans #tfidf #lms

SESH.sx @[email protected] · 2026-04-30 · 16:00 UTC

PTS ϟ 13th Birthday @ Strange Brew - 02 May feat. Evian Christ, k means, Nono Gigsta + more

#SESH #EvianChrist #kmeans #NonoGigsta

https://sesh.sx/e/1714468

#sesh #evianchrist #kmeans #nonogigsta

Statistics Globe @[email protected] · 2026-04-17 · 05:21 UTC

K-means clustering is a simple and widely used method for identifying patterns in data.

I also use it in a recent Statistics Globe Hub module, where it is combined with synthetic data created using the drawdata Python library: https://statisticsglobe.com/hub

#datascience #machinelearning #kmeans #rstats

Statistics Globe @[email protected] · 2026-04-17 · 05:21 UTC

K-means clustering is a simple and widely used method for identifying patterns in data.

I also use it in a recent Statistics Globe Hub module, where it is combined with synthetic data created using the drawdata Python library: https://statisticsglobe.com/hub

#datascience #machinelearning #kmeans #rstats

Statistics Globe @[email protected] · 2026-04-09 · 07:39 UTC

I’ve just published a new module in the Statistics Globe Hub on how to draw synthetic datasets using the drawdata Python library and analyze them afterward in R using k-means clustering.

More information about the Statistics Globe Hub: https://statisticsglobe.com/hub

#datascience #python #rstats #kmeans

Statistics Globe @[email protected] · 2026-04-09 · 07:39 UTC

I’ve just published a new module in the Statistics Globe Hub on how to draw synthetic datasets using the drawdata Python library and analyze them afterward in R using k-means clustering.

More information about the Statistics Globe Hub: https://statisticsglobe.com/hub

#datascience #python #rstats #kmeans

N-gated Hacker News @[email protected] · 2026-03-20 · 10:40 UTC

🤖: Spoiler Alert! Flash-KMeans promises to be a "memory-efficient" magic trick, unless you count the mental gymnastics required to understand it. 🤯 Just what the world needs, another K-Means #variant to make your brain cells do a triple axel! 🧠💥
https://arxiv.org/abs/2603.09229 #FlashKMeans #MemoryEfficient #KMeans #DataScience #MachineLearning #AI #HackerNews #ngated

#variant #flashkmeans #memoryefficient #kmeans #datascience #machinelearning

N-gated Hacker News @[email protected] · 2026-03-20 · 10:40 UTC

🤖: Spoiler Alert! Flash-KMeans promises to be a "memory-efficient" magic trick, unless you count the mental gymnastics required to understand it. 🤯 Just what the world needs, another K-Means #variant to make your brain cells do a triple axel! 🧠💥
https://arxiv.org/abs/2603.09229 #FlashKMeans #MemoryEfficient #KMeans #DataScience #MachineLearning #AI #HackerNews #ngated

#variant #flashkmeans #memoryefficient #kmeans #datascience #machinelearning

Karsten Schmidt @[email protected] · 2026-02-14 · 09:37 UTC

@zefu I find the tool works best for images with a decent contrast and/or color hue range. I also recommend not choosing more than 5-8 colors to avoid too many similar ones. Also bear in mind that k-means clustering relies on random initializations and so running the process multiple times for the same image can lead to slightly different results (just press "update" a few times and see if there're any decent changes)...

Another tip: I personally like having palettes which also include some desaturated colors, so try reducing the "min chroma" slider value (a change will recompute automatically). If you only want more rich colors, then bump up the value, but it all really very much depends on the image... The two variations attached here use min chroma 5 and 0...

https://demo.thi.ng/umbrella/dominant-colors/

#ThingUmbrella #DominantColors #KMeans

#thingumbrella #dominantcolors #kmeans

Karsten Schmidt @[email protected] · 2026-02-14 · 09:37 UTC

@zefu I find the tool works best for images with a decent contrast and/or color hue range. I also recommend not choosing more than 5-8 colors to avoid too many similar ones. Also bear in mind that k-means clustering relies on random initializations and so running the process multiple times for the same image can lead to slightly different results (just press "update" a few times and see if there're any decent changes)...

Another tip: I personally like having palettes which also include some desaturated colors, so try reducing the "min chroma" slider value (a change will recompute automatically). If you only want more rich colors, then bump up the value, but it all really very much depends on the image... The two variations attached here use min chroma 5 and 0...

https://demo.thi.ng/umbrella/dominant-colors/

#ThingUmbrella #DominantColors #KMeans

#thingumbrella #dominantcolors #kmeans

Karsten Schmidt @[email protected] · 2026-02-13 · 17:54 UTC

@zefu I should update the readme to explain how these palettes were created. They're a manually curated selection of running hundreds of images through this tool (doesn't look like much, but it's been super helpful over the years) and then handpicking my favorites:

https://demo.thi.ng/umbrella/dominant-colors/

This uses k-means clustering for segmentation, also available as library:

https://thi.ng/pixel-dominant-colors

#ThingUmbrella #Color #KMeans #Tool

#thingumbrella #color #kmeans #tool

Karsten Schmidt @[email protected] · 2026-02-13 · 17:54 UTC

@zefu I should update the readme to explain how these palettes were created. They're a manually curated selection of running hundreds of images through this tool (doesn't look like much, but it's been super helpful over the years) and then handpicking my favorites:

https://demo.thi.ng/umbrella/dominant-colors/

This uses k-means clustering for segmentation, also available as library:

https://thi.ng/pixel-dominant-colors

#ThingUmbrella #Color #KMeans #Tool

#thingumbrella #color #kmeans #tool

Hacker News @[email protected] · 2025-09-29 · 18:03 UTC

ML on Apple ][+

https://mdcramer.github.io/apple-2-blog/k-means/

#HackerNews #ML #Apple2 #Kmeans #Technology #RetroComputing

#hackernews #ml #apple2 #kmeans #technology #retrocomputing

Hacker News @[email protected] · 2025-09-29 · 18:03 UTC

ML on Apple ][+

https://mdcramer.github.io/apple-2-blog/k-means/

#HackerNews #ML #Apple2 #Kmeans #Technology #RetroComputing

#hackernews #ml #apple2 #kmeans #technology #retrocomputing

Hacker News @[email protected] · 2025-07-09 · 21:37 UTC

HyAB k-means for color quantization

https://30fps.net/pages/hyab-kmeans/

#HackerNews #HyAB #k-means #color #quantization #colorquantization #machinelearning #dataanalysis #kmeans

#hackernews #hyab #k #color #quantization #colorquantization

Hacker News @[email protected] · 2025-07-09 · 21:37 UTC

HyAB k-means for color quantization

https://30fps.net/pages/hyab-kmeans/

#HackerNews #HyAB #k-means #color #quantization #colorquantization #machinelearning #dataanalysis #kmeans

#hackernews #hyab #k #color #quantization #colorquantization

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#thingumbrella #text #analysis #cluster #kmeans #tfidf

Karsten Schmidt @[email protected] · 2025-06-15 · 13:07 UTC

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

#thingumbrella #text #analysis #cluster #kmeans #tfidf

Michele Clamp Art @[email protected] · 2025-04-16 · 14:59 UTC

Playing around with features for the next version of #chromamagic. Producing a reduced palette is trickier than you might think to make something useful for painters. #octtree #kmeans #imagequantization #oilpainting #watercolor

#chromamagic #octtree #kmeans #imagequantization #oilpainting #watercolor

IB Teguh TM @[email protected] · 2025-02-06 · 10:43 UTC

Master K-means Clustering with Rand Index and Adjusted Rand Score. Unlock valuable insights into your data by grouping and evaluating points for maximum clarity. #kmeans #clustering

https://teguhteja.id/k-means-clustering-master-rand-index-adjusted-rand-score/

#kmeans #clustering

IB Teguh TM @[email protected] · 2025-02-06 · 10:43 UTC

Master K-means Clustering with Rand Index and Adjusted Rand Score. Unlock valuable insights into your data by grouping and evaluating points for maximum clarity. #kmeans #clustering

https://teguhteja.id/k-means-clustering-master-rand-index-adjusted-rand-score/

#kmeans #clustering

Habr @[email protected] · 2024-12-22 · 12:42 UTC

Машинное обучение: Кластеризация методом K-means. Теория и реализация. С нуля

Здравствуйте, дорогие читатели. В этой статье я приведу разбор того, как работает метод кластеризации К-средних на низком уровне. Содержание: идея метода, как присваивать метки неразмеченным объектам, реализация на чистом Python и разбор кода.

https://habr.com/ru/articles/868542/

#кластеризация #kmeans #kсредних #машинное_обучение

Habr @[email protected] · 2024-06-12 · 12:52 UTC

Как анализировать тысячи отзывов с ChatGPT? Частые ошибки и пример на реальных данных

В этой статье я расскажу про свой опыт решения рабочей задачи — анализ отзывов о компании от пользователей. Мы разберем возможные ошибки и посмотрим на пример кода и реальных данных. Гайд будет полезен всем, у кого нет большого опыта в анализе данных или работе с LLM через API.

https://habr.com/ru/articles/821287/

#llm #gpt #chatgpt #python #clustering #kmeans #tsne #visualization #summarization #data_analysis

#data_analysis #summarization #visualization #tsne #kmeans #clustering

Habr @[email protected] · 2024-04-06 · 08:32 UTC

Анализ новостей с помощью сегментации и кластеризации временных рядов

В Отусе я прошла курс ML Advanced и открыла для себя интересные темы, связанные с анализом временных рядов, а именно, их сегментацию и кластеризацию. Я решила позаимствовать полученные знания для своей дипломной университетской работы по ивент-анализу социальных явлений и событий и описать часть этого исследования в данной статье. Шаг 1. Сбор данных В качестве источника данных я взяла информационно-новостной ресурс Лента.ру , так как с него легко парсить данные, новости разнообразны и пополняются в большом объеме ежедневно. Для теста я спарсила новости за последний год (март 2023 – март 2024) с помощью питоновских BeautifulSoup и requests . В коде происходит процедура сбора заголовка, даты и тематики новостей:

https://habr.com/ru/articles/805801/

#сегментация #анализ_временных_рядов #кластеризация_данных #новостные_ресурсы #тематическое_моделирование #kmeans #python #машинное_обучение #otus

#otus #машинное_обучение #python #kmeans #тематическое_моделирование #новостные_ресурсы

Statistics Globe @[email protected] · 2024-03-31 · 19:32 UTC

Principal Component Analysis (PCA) reduces the dimensionality of your data, enhancing the efficiency and accuracy of K-means clustering by focusing on the most informative features.

More info in my upcoming course: https://statisticsglobe.com/online-course-pca-theory-application-r

#PCA #KMeans #DataAnalysis #rstats #datascience

#pca #kmeans #dataanalysis #rstats #datascience

Statistics Globe @[email protected] · 2024-03-31 · 19:32 UTC

Principal Component Analysis (PCA) reduces the dimensionality of your data, enhancing the efficiency and accuracy of K-means clustering by focusing on the most informative features.

More info in my upcoming course: https://statisticsglobe.com/online-course-pca-theory-application-r

#PCA #KMeans #DataAnalysis #rstats #datascience

#pca #kmeans #dataanalysis #rstats #datascience

Oliver D. Reithmaier @[email protected] · 2023-07-03 · 06:46 UTC

#psychometrics community: I found a paper that developed a short scale and tested it via #LPA and #kmeans clustering. (Paper here: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281021). Now for me, this is odd as it uses two clustering techniques to assess scale quality. But then again this is a sociology paper and I know that sociologists and psychologists have a different world view. In case you didn't know: Sociologists tend to look at groups within society or societies at large, whereas psychologists tend to see individuals and groups as aggregates of individuals. Obviously, coming from a sociological perspective, using such clustering methods makes sense. However, I still have mixed feelings about this approach. I still feel a IRT approach would be better since obviously k-means and LPA does NOTHING to evaluate items, for example.

How do you see this? Am I completely wrong here?

#psychometrics #lpa #kmeans

Thomas E. Gladwin @[email protected] · 2023-04-23 · 17:53 UTC

Some methods to determine the number of components or clusters in PCA or k-means clustering: https://thomasgladwin.substack.com/p/finding-the-true-number-of-components/. These at least work in the limit of ideal simulated data.

The basic rationale is to use random split-half data to identify what's "true" versus sampling error. Scores are based on similarities between eigenvectors or cluster centres, rather than, e.g., the shape of the eigenvalue plot.

#machineLearning #clustering #kmeans #PCA #scree #python

#python #scree #pca #kmeans #clustering #machinelearning

Thomas E. Gladwin @[email protected] · 2023-04-23 · 17:53 UTC

Some methods to determine the number of components or clusters in PCA or k-means clustering: https://thomasgladwin.substack.com/p/finding-the-true-number-of-components/. These at least work in the limit of ideal simulated data.

The basic rationale is to use random split-half data to identify what's "true" versus sampling error. Scores are based on similarities between eigenvectors or cluster centres, rather than, e.g., the shape of the eigenvalue plot.

#machineLearning #clustering #kmeans #PCA #scree #python

#python #scree #pca #kmeans #clustering #machinelearning

Brendan Hayward @bmjhayward · 2023-01-11 · 23:31 UTC

if you're using #kmeans or other clustering algorithms and you use the elbow-method or visual inspection to choose the number of clusters, this paper is for you.