#embedding — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #embedding, aggregated by home.social.

michabbb @[email protected] · 2026-04-23 · 10:48 UTC

#GeminiEmbedding2 is now generally available — #Google's first natively multimodal #embedding model, mapping text, images, video, audio & documents into ONE unified space 🚀
🧠 Built on the #Gemini architecture — no more fragmented multi-pipeline setups. One model handles all media types in a single, unified embedding space.
🧵👇#AI #GoogleAI #MachineLearning

#geminiembedding2 #google #embedding #gemini #ai #googleai
Mark Gritter @[email protected] · 2026-04-11 · 07:09 UTC

I've got a a bunch of #embedding vectors and want to convert L2 distance into something meaningful.
Geometry says that if randomly distributed around a 512-sphere, nearly all points should be about sqrt(2) away, so it's hopeless.
In practice my distribution looks like:
p10: 1.074
p25: 1.217
p50: 1.271
p75: 1.311
p90: 1.343
p99: 1.395
so I can work with that to make it a 100-0 score, but it seems hard to justify a priori? Who can I read about the geometry of LLM embeddings?

#embedding
Rost Glukhov @[email protected] · 2026-04-06 · 23:35 UTC

Learn how chunking strategies impact RAG performance in 2026, including fixed-size, semantic, and hybrid approaches. Discover optimization techniques for use cases like medical research and legal analysis using tools like LangChain and embedding models.
#RAG #chunking #semantic chunking #LangChain #embedding models
https://dasroot.net/posts/2026/02/chunking-strategies-rag-performance/

#rag #chunking #semantic #langchain #embedding
Rost Glukhov @[email protected] · 2026-04-06 · 23:35 UTC

Learn how chunking strategies impact RAG performance in 2026, including fixed-size, semantic, and hybrid approaches. Discover optimization techniques for use cases like medical research and legal analysis using tools like LangChain and embedding models.
#RAG #chunking #semantic chunking #LangChain #embedding models
https://dasroot.net/posts/2026/02/chunking-strategies-rag-performance/

#rag #chunking #semantic #langchain #embedding
Rost Glukhov @[email protected] · 2026-04-06 · 23:35 UTC

Learn how chunking strategies impact RAG performance in 2026, including fixed-size, semantic, and hybrid approaches. Discover optimization techniques for use cases like medical research and legal analysis using tools like LangChain and embedding models.
#RAG #chunking #semantic chunking #LangChain #embedding models
https://dasroot.net/posts/2026/02/chunking-strategies-rag-performance/

#rag #chunking #semantic #langchain #embedding
Rost Glukhov @[email protected] · 2026-04-06 · 23:35 UTC

Learn how chunking strategies impact RAG performance in 2026, including fixed-size, semantic, and hybrid approaches. Discover optimization techniques for use cases like medical research and legal analysis using tools like LangChain and embedding models.
#RAG #chunking #semantic chunking #LangChain #embedding models
https://dasroot.net/posts/2026/02/chunking-strategies-rag-performance/

#embedding #langchain #semantic #chunking #rag
Rost Glukhov @ros · 2026-04-06 · 23:35 UTC

Learn how chunking strategies impact RAG performance in 2026, including fixed-size, semantic, and hybrid approaches. Discover optimization techniques for use cases like medical research and legal analysis using tools like LangChain and embedding models.
#RAG #chunking #semantic chunking #LangChain #embedding models
https://dasroot.net/posts/2026/02/chunking-strategies-rag-performance/

#rag #chunking #semantic #langchain #embedding
Hacker News @[email protected] · 2026-03-24 · 16:35 UTC

Gemini can now natively embed video, so I built sub-second video search
https://github.com/ssrajadh/sentrysearch
#HackerNews #Gemini #Video #Search #Video #Embedding #Tech #Innovation #HackerNews

#hackernews #gemini #video #search #embedding #tech
Gea-Suan Lin @[email protected] · 2026-03-17 · 21:41 UTC

https://blog.gslin.org/archives/2026/03/18/12938/binary-representation-hamming-distance-%e6%99%82%e7%9a%84%e6%90%9c%e5%b0%8b%e6%bc%94%e7%ae%97%e6%b3%95/
Binary Representation + Hamming Distance 時的搜尋演算法
#algo #algorithm #binary #distance #embedding #facebook #faiss #hamming #hashing #index #model #multi #popcount #representation #research #search #space #xor

#algo #algorithm #binary #distance #embedding #facebook
Gea-Suan Lin @[email protected] · 2026-03-17 · 21:41 UTC

https://blog.gslin.org/archives/2026/03/18/12938/binary-representation-hamming-distance-%e6%99%82%e7%9a%84%e6%90%9c%e5%b0%8b%e6%bc%94%e7%ae%97%e6%b3%95/
Binary Representation + Hamming Distance 時的搜尋演算法
#algo #algorithm #binary #distance #embedding #facebook #faiss #hamming #hashing #index #model #multi #popcount #representation #research #search #space #xor

#algo #algorithm #binary #distance #embedding #facebook
Gea-Suan Lin @[email protected] · 2026-03-17 · 21:41 UTC

https://blog.gslin.org/archives/2026/03/18/12938/binary-representation-hamming-distance-%e6%99%82%e7%9a%84%e6%90%9c%e5%b0%8b%e6%bc%94%e7%ae%97%e6%b3%95/
Binary Representation + Hamming Distance 時的搜尋演算法
#algo #algorithm #binary #distance #embedding #facebook #faiss #hamming #hashing #index #model #multi #popcount #representation #research #search #space #xor

#algo #algorithm #binary #distance #embedding #facebook
Gea-Suan Lin @[email protected] · 2026-03-17 · 21:41 UTC

https://blog.gslin.org/archives/2026/03/18/12938/binary-representation-hamming-distance-%e6%99%82%e7%9a%84%e6%90%9c%e5%b0%8b%e6%bc%94%e7%ae%97%e6%b3%95/
Binary Representation + Hamming Distance 時的搜尋演算法
#algo #algorithm #binary #distance #embedding #facebook #faiss #hamming #hashing #index #model #multi #popcount #representation #research #search #space #xor

#algo #algorithm #binary #distance #embedding #facebook
Gea-Suan Lin @[email protected] · 2026-03-17 · 21:41 UTC

https://blog.gslin.org/archives/2026/03/18/12938/binary-representation-hamming-distance-%e6%99%82%e7%9a%84%e6%90%9c%e5%b0%8b%e6%bc%94%e7%ae%97%e6%b3%95/
Binary Representation + Hamming Distance 時的搜尋演算法
#algo #algorithm #binary #distance #embedding #facebook #faiss #hamming #hashing #index #model #multi #popcount #representation #research #search #space #xor

#xor #space #search #research #representation #popcount
Rod2ik 🇪🇺 🇨🇵 🇪🇸 🇨🇱 🇺🇦 🇨🇦 🇬🇱☮🕊️ @[email protected] · 2026-03-13 · 21:58 UTC

#Google lance #Gemini #Embedding 2, un modèle qui comprend #texte, #image, #vidéo et #audio en même temps
https://korben.info/google-lance-gemini-embedding-2-un-modele-qui-comprend-texte-image-video-et-audio-en-meme-temps.html

#google #gemini #embedding #texte #image #video
Techdirt [Unofficial] @[email protected] · 2026-03-11 · 22:36 UTC

EFF To Court: Don’t Make Embedding Illegal

https://fed.brid.gy/r/https://www.techdirt.com/2026/03/11/eff-to-court-dont-make-embedding-illegal/

#emmerichnewspapers #5thcircuit #copyright #embedding #intermediaryliability #liability
Techdirt [Unofficial] @[email protected] · 2026-03-11 · 22:36 UTC

EFF To Court: Don’t Make Embedding Illegal

https://fed.brid.gy/r/https://www.techdirt.com/2026/03/11/eff-to-court-dont-make-embedding-illegal/

#emmerichnewspapers #5thcircuit #copyright #embedding #intermediaryliability #liability
Techdirt [Unofficial] @[email protected] · 2026-03-11 · 22:36 UTC

EFF To Court: Don’t Make Embedding Illegal

https://fed.brid.gy/r/https://www.techdirt.com/2026/03/11/eff-to-court-dont-make-embedding-illegal/

#emmerichnewspapers #5thcircuit #copyright #embedding #intermediaryliability #liability
Techdirt [Unofficial] @[email protected] · 2026-03-11 · 22:36 UTC

EFF To Court: Don’t Make Embedding Illegal

https://fed.brid.gy/r/https://www.techdirt.com/2026/03/11/eff-to-court-dont-make-embedding-illegal/

#servertest #linking #liability #intermediaryliability #embedding #copyright
Techdirt [Unofficial] @[email protected] · 2026-03-11 · 22:36 UTC

EFF To Court: Don’t Make Embedding Illegal

https://fed.brid.gy/r/https://www.techdirt.com/2026/03/11/eff-to-court-dont-make-embedding-illegal/

#emmerichnewspapers #5thcircuit #copyright #embedding #intermediaryliability #liability
Habr @[email protected] · 2026-03-10 · 07:32 UTC

Малоресурсный язык ломает коммерческие embedding: R@1 0,83 (LaBSE) vs 0,21 (OpenAI) на армянском EPG
Платные модели embedding не гарантируют качество на малоресурсных языках. На задаче кроссязыкового сопоставления EPG-заголовков (EN/RU/HY) бесплатная LaBSE набирает R@1 = 0,83, а OpenAI text-embedding-3-large -- 0,21. Протестировано 19 моделей, код и данные открыты.
https://habr.com/ru/articles/1008422/
#embedding #openai #малоресурсный_язык #sentencetransformers #tokenizer #iptv #epg #benchmark #эмбеддинг

#эмбеддинг #benchmark #epg #iptv #tokenizer #sentencetransformers
yegorov @[email protected] · 2026-03-10 · 06:37 UTC

If you are building an application that requires search, I recommend using Elasticsearch early on. In addition to the usual full-text search, Elasticsearch allows you to perform a hybrid search: combine the results of text and vector search.
Of course, for small amounts of data, you can use PostgreSQL tsvector with the pgvector extension, but in the long term, Elasticsearch will provide good performance.
#Elasticsearch #Search #tsvector #pgvector #KNN #Embedding #SentenceTransformers #AI

#elasticsearch #search #tsvector #pgvector #knn #embedding
Habr @[email protected] · 2026-03-04 · 19:02 UTC

Научил ИИ-агента помнить важное и забывать лишнее в SQLite
Я делаю локально работающего ИИ-агента и столкнулся с тем, что стандартный подход «закинуть текст в векторную базу, достать по косинусу» для долгоживущего агента не работает: контекст замусоривается, факты конфликтуют, ничего не забывается. Вместо этого реализовал графовую когнитивную память поверх одного файла SQLite: эпизодические и семантические узлы, типизированные рёбра, именованные сущности, гибридный поиск (FTS5 + vector + graph) с Reciprocal Rank Fusion, кривую забывания Эббингауза и фоновую LLM-консолидацию. В статье — полная архитектура с кодом, SQL-схемой и формулами. Код и минимальный пример — в репозитории . Дальше long-read
https://habr.com/ru/articles/1006622/
#ai_agent #ai #ии #ииагенты #память #sqlite #vector #embedding

#embedding #vector #sqlite #память #ииагенты #ии
Brandon H :csharp: :verified: @[email protected] · 2026-02-26 · 23:12 UTC

via @dotnet : Vector Data in .NET – Building Blocks for AI Part 2
https://ift.tt/VtJUvye
#VectorData #NET #AI #BuildingBlocks #SemanticSearch #RAG #Embedding #Embeddings #VectorDatabase #Qdrant #Redis #CosmosDB #SQLServer #PostgreSQL #SQLite #InMemory #VectorSto…

#vectordata #net #ai #buildingblocks #semanticsearch #rag
Brandon H :csharp: :verified: @[email protected] · 2026-02-26 · 23:12 UTC

via @dotnet : Vector Data in .NET – Building Blocks for AI Part 2
https://ift.tt/VtJUvye
#VectorData #NET #AI #BuildingBlocks #SemanticSearch #RAG #Embedding #Embeddings #VectorDatabase #Qdrant #Redis #CosmosDB #SQLServer #PostgreSQL #SQLite #InMemory #VectorSto…

#redis #cosmosdb #sqlserver #postgresql #sqlite #inmemory
Brandon H :csharp: :verified: @bc3tech · 2026-02-26 · 23:12 UTC

via @dotnet : Vector Data in .NET – Building Blocks for AI Part 2
https://ift.tt/VtJUvye
#VectorData #NET #AI #BuildingBlocks #SemanticSearch #RAG #Embedding #Embeddings #VectorDatabase #Qdrant #Redis #CosmosDB #SQLServer #PostgreSQL #SQLite #InMemory #VectorSto…

#vectordata #net #ai #buildingblocks #semanticsearch #rag
Brandon H :csharp: :verified: @[email protected] · 2026-02-26 · 23:12 UTC

via @dotnet : Vector Data in .NET – Building Blocks for AI Part 2
https://ift.tt/VtJUvye
#VectorData #NET #AI #BuildingBlocks #SemanticSearch #RAG #Embedding #Embeddings #VectorDatabase #Qdrant #Redis #CosmosDB #SQLServer #PostgreSQL #SQLite #InMemory #VectorSto…

#vectorsto #inmemory #sqlite #postgresql #sqlserver #cosmosdb
Brandon H :csharp: :verified: @[email protected] · 2026-02-26 · 23:12 UTC

via @dotnet : Vector Data in .NET – Building Blocks for AI Part 2
https://ift.tt/VtJUvye
#VectorData #NET #AI #BuildingBlocks #SemanticSearch #RAG #Embedding #Embeddings #VectorDatabase #Qdrant #Redis #CosmosDB #SQLServer #PostgreSQL #SQLite #InMemory #VectorSto…

#vectordata #net #ai #buildingblocks #semanticsearch #rag
Inautilo @[email protected] · 2026-02-05 · 06:05 UTC

#Development #Approaches
Performance-optimized video embeds · Lazy-loading videos on interaction using only HTML/CSS https://ilo.im/16ac4f
_____
#Embedding #LazyLoading #Videos #HtmlDetails #iFrames #WebPerf #WebDev #Frontend #HTML #CSS

#development #approaches #embedding #lazyloading #videos #htmldetails
Inautilo @[email protected] · 2026-02-05 · 06:05 UTC

#Development #Approaches
Performance-optimized video embeds · Lazy-loading videos on interaction using only HTML/CSS https://ilo.im/16ac4f
_____
#Embedding #LazyLoading #Videos #HtmlDetails #iFrames #WebPerf #WebDev #Frontend #HTML #CSS

#development #approaches #embedding #lazyloading #videos #htmldetails
Inautilo @[email protected] · 2026-02-05 · 06:05 UTC

#Development #Approaches
Performance-optimized video embeds · Lazy-loading videos on interaction using only HTML/CSS https://ilo.im/16ac4f
_____
#Embedding #LazyLoading #Videos #HtmlDetails #iFrames #WebPerf #WebDev #Frontend #HTML #CSS

#css #html #frontend #webdev #webperf #iframes
Inautilo @[email protected] · 2026-02-05 · 06:05 UTC

#Development #Approaches
Performance-optimized video embeds · Lazy-loading videos on interaction using only HTML/CSS https://ilo.im/16ac4f
_____
#Embedding #LazyLoading #Videos #HtmlDetails #iFrames #WebPerf #WebDev #Frontend #HTML #CSS

#development #approaches #embedding #lazyloading #videos #htmldetails
beSpacific @[email protected] · 2026-02-04 · 03:04 UTC

Via #LLRX All In: #Embedding #AI in #Law School #Classroom – What is irreducibly human element in #legal #education when AI can pass the #bar #exam, generate effective lectures, provide personalized #learning & #academic support? This article by law prof Gregory M. Duhl confronts that ? head-on by documenting planning, design of a comprehensive transformation of a required doctrinal law school course for 1st yr #Contracts w AI fully embedded throughout the course design. https://www.llrx.com/2026/01/all-in-embedding-ai-in-the-law-school-classroom/

#llrx #embedding #ai #law #classroom #legal
Habr @[email protected] · 2026-01-23 · 07:22 UTC

Почему ваш RAG не найдёт нужные документы: математический потолок embedding-моделей
Все говорят про embedding-модели в RAG: бенчмарки MTEB, размеры моделей, chunking-стратегии. Но никто не задаёт главный вопрос: а сколько вообще документов может найти single-vector retrieval? Google DeepMind посчитали. Оказалось, что даже 4096-мерные эмбеддинги упираются в математический потолок — есть задачи, где они физически не смогут найти нужный документ из топ-2, даже если модель идеально обучена. В статье разбирается исследование LIMIT, показаны примеры, где dense retrieval проваливается (а BM25 справляется), и объяснено, почему для production-систем нужен гибридный поиск, а не слепая вера в SOTA-эмбеддинги.
https://habr.com/ru/articles/987954/
#RAG #embedding #retrieval #machine_learning #BM25 #поиск #нейросети #векторные_базы_данных

#векторные_базы_данных #нейросети #поиск #bm25 #machine_learning #retrieval
Habr @[email protected] · 2026-01-23 · 07:22 UTC

Почему ваш RAG не найдёт нужные документы: математический потолок embedding-моделей
Все говорят про embedding-модели в RAG: бенчмарки MTEB, размеры моделей, chunking-стратегии. Но никто не задаёт главный вопрос: а сколько вообще документов может найти single-vector retrieval? Google DeepMind посчитали. Оказалось, что даже 4096-мерные эмбеддинги упираются в математический потолок — есть задачи, где они физически не смогут найти нужный документ из топ-2, даже если модель идеально обучена. В статье разбирается исследование LIMIT, показаны примеры, где dense retrieval проваливается (а BM25 справляется), и объяснено, почему для production-систем нужен гибридный поиск, а не слепая вера в SOTA-эмбеддинги.
https://habr.com/ru/articles/987954/
#RAG #embedding #retrieval #machine_learning #BM25 #поиск #нейросети #векторные_базы_данных

#векторные_базы_данных #нейросети #поиск #bm25 #machine_learning #retrieval
Habr @[email protected] · 2026-01-23 · 07:22 UTC

Почему ваш RAG не найдёт нужные документы: математический потолок embedding-моделей
Все говорят про embedding-модели в RAG: бенчмарки MTEB, размеры моделей, chunking-стратегии. Но никто не задаёт главный вопрос: а сколько вообще документов может найти single-vector retrieval? Google DeepMind посчитали. Оказалось, что даже 4096-мерные эмбеддинги упираются в математический потолок — есть задачи, где они физически не смогут найти нужный документ из топ-2, даже если модель идеально обучена. В статье разбирается исследование LIMIT, показаны примеры, где dense retrieval проваливается (а BM25 справляется), и объяснено, почему для production-систем нужен гибридный поиск, а не слепая вера в SOTA-эмбеддинги.
https://habr.com/ru/articles/987954/
#RAG #embedding #retrieval #machine_learning #BM25 #поиск #нейросети #векторные_базы_данных

#векторные_базы_данных #нейросети #поиск #bm25 #machine_learning #retrieval
Habr @[email protected] · 2026-01-23 · 07:22 UTC

Почему ваш RAG не найдёт нужные документы: математический потолок embedding-моделей
Все говорят про embedding-модели в RAG: бенчмарки MTEB, размеры моделей, chunking-стратегии. Но никто не задаёт главный вопрос: а сколько вообще документов может найти single-vector retrieval? Google DeepMind посчитали. Оказалось, что даже 4096-мерные эмбеддинги упираются в математический потолок — есть задачи, где они физически не смогут найти нужный документ из топ-2, даже если модель идеально обучена. В статье разбирается исследование LIMIT, показаны примеры, где dense retrieval проваливается (а BM25 справляется), и объяснено, почему для production-систем нужен гибридный поиск, а не слепая вера в SOTA-эмбеддинги.
https://habr.com/ru/articles/987954/
#RAG #embedding #retrieval #machine_learning #BM25 #поиск #нейросети #векторные_базы_данных

#rag #embedding #retrieval #machine_learning #bm25 #поиск
N-gated Hacker News @[email protected] · 2026-01-17 · 23:02 UTC

Introducing Xous: the world's most exciting #microkernel for #embedding your #dreams into “medium” devices! 🤖✨ Dive into #userspace #messaging wonders, because, who needs a simple, straightforward OS anyway? 📚🧐 Funded by Europe's finest to revolutionize the way we...well, forget it ever existed. 😂💰
https://xous.dev/ #Xous #tech #revolution #HackerNews #ngated

#microkernel #embedding #dreams #userspace #messaging #xous
N-gated Hacker News @[email protected] · 2026-01-17 · 23:02 UTC

Introducing Xous: the world's most exciting #microkernel for #embedding your #dreams into “medium” devices! 🤖✨ Dive into #userspace #messaging wonders, because, who needs a simple, straightforward OS anyway? 📚🧐 Funded by Europe's finest to revolutionize the way we...well, forget it ever existed. 😂💰
https://xous.dev/ #Xous #tech #revolution #HackerNews #ngated

#microkernel #embedding #dreams #userspace #messaging #xous
N-gated Hacker News @[email protected] · 2026-01-17 · 23:02 UTC

Introducing Xous: the world's most exciting #microkernel for #embedding your #dreams into “medium” devices! 🤖✨ Dive into #userspace #messaging wonders, because, who needs a simple, straightforward OS anyway? 📚🧐 Funded by Europe's finest to revolutionize the way we...well, forget it ever existed. 😂💰
https://xous.dev/ #Xous #tech #revolution #HackerNews #ngated

#ngated #hackernews #revolution #tech #xous #messaging
N-gated Hacker News @[email protected] · 2026-01-17 · 23:02 UTC

Introducing Xous: the world's most exciting #microkernel for #embedding your #dreams into “medium” devices! 🤖✨ Dive into #userspace #messaging wonders, because, who needs a simple, straightforward OS anyway? 📚🧐 Funded by Europe's finest to revolutionize the way we...well, forget it ever existed. 😂💰
https://xous.dev/ #Xous #tech #revolution #HackerNews #ngated

#microkernel #embedding #dreams #userspace #messaging #xous
Carolina Code Conference @[email protected] · 2025-12-17 · 02:43 UTC

FYI: Vector Search Magic: Superhero & Sporty Car Combo! #shorts: By using vector math, one can combine a DeLorean image with the text query for superhero to generate an embedding. Averaging these embeddings together then allows a search to find a superhero on top of a sporty car with cool lights. #vectormath #image #superhero #DeLorean #embedding https://www.youtube.com/shorts/RMX2A02pP90

#shorts #vectormath #image #superhero #delorean #embedding
Gea-Suan Lin @[email protected] · 2025-12-04 · 00:38 UTC

https://blog.gslin.org/archives/2025/12/04/12773/amazon-s3-vectors-%e7%9a%84-ga-%e7%89%88%e5%a4%a7%e5%b9%85%e5%a2%9e%e5%8a%a0%e4%ba%86%e7%ad%86%e6%95%b8%e9%99%90%e5%88%b6/
Amazon S3 Vectors 的 GA 版大幅增加了筆數限制
#amazon #aws #cloud #embedding #ga #s3 #service #vectors

#amazon #aws #cloud #embedding #ga #s3
Habr @[email protected] · 2025-11-25 · 14:02 UTC

За пределами embeddings: комбинируем векторный и лексический поиск для повышения релевантности
Привет, Хабр! В предыдущем материале мы упомянули, что при работе с текстовыми корпусами embedding-модели не всегда оптимальный инструмент. В этой публикации на примере задачи поиска релевантных документов по запросу рассмотрим ограничения такого варианта решения, разберем на практике гибридный подход и оценим его эффективность. Меня зовут Вадим Скляров, я аналитик компании MWS, и уже по традиции мы будем разбираться в технической задаче с позиции системного и бизнес-анализа: — сформулируем основные моменты, которые нужно знать и описать, прежде чем передать проект команде разработки; — рассмотрим, как быстро проверить подходы к решению.
https://habr.com/ru/companies/ru_mts/articles/970044/
#векторный_поиск #гибридный_поиск #embedding #лексический_анализ #косинусное_сходство #извлечение_признаков #поиск_релевантных_документов #слияние_рангов

#векторный_поиск #гибридный_поиск #embedding #лексический_анализ #косинусное_сходство #извлечение_признаков
Carolina Code Conference @[email protected] · 2025-11-17 · 23:42 UTC

ICYMI: Vector Search Magic: Superhero & Sporty Car Combo! #shorts: By using vector math, one can combine a DeLorean image with the text query for superhero to generate an embedding. Averaging these embeddings together then allows a search to find a superhero on top of a sporty car with cool lights. #vectormath #image #superhero #DeLorean #embedding https://www.youtube.com/shorts/RMX2A02pP90

#shorts #vectormath #image #superhero #delorean #embedding
Habr @[email protected] · 2025-11-13 · 07:12 UTC

Без интернета и шпионов: как мы собрали локального голосового ассистента
Облачные ассистенты вроде Алисы , Google Assistant и Siri давно стали привычными. Но у всех у них одни и те же слабые места: зависимость от быстрого интернета и риск утечки данных. И речь не только о персональной информации — дома нередко обсуждают темы, которые можно отнести к коммерческой или даже военной тайне. Неудивительно, что многим некомфортно говорить в присутствии микрофона, который каждое слово отправляет куда-то «в облако» (один из наших заказчиков прямо сказал: «никаких Алис в доме не будет») . На Хабре уже появлялись статьи про попытки заменить Алису на полностью локальные решения. Но почти всегда все сводилось к стандартной схеме: ESP32-микрофон → Home Assistant → intent recognition . Такая связка работает, но до действительно «умного» ассистента ей далеко. Мы пошли дальше и собрали свой голосовой ассистент, о котором расскажем в статье.
https://habr.com/ru/companies/wirenboard/articles/965856/
#Wiren_Board #BARY #Алиса #голосовой_ассистент #распознавание_речи #vosk #Piper #Embedding #Wake_Word #умный_дом

#умный_дом #wake_word #embedding #piper #vosk #распознавание_речи
Habr @[email protected] · 2025-11-13 · 07:12 UTC

Без интернета и шпионов: как мы собрали локального голосового ассистента
Облачные ассистенты вроде Алисы , Google Assistant и Siri давно стали привычными. Но у всех у них одни и те же слабые места: зависимость от быстрого интернета и риск утечки данных. И речь не только о персональной информации — дома нередко обсуждают темы, которые можно отнести к коммерческой или даже военной тайне. Неудивительно, что многим некомфортно говорить в присутствии микрофона, который каждое слово отправляет куда-то «в облако» (один из наших заказчиков прямо сказал: «никаких Алис в доме не будет») . На Хабре уже появлялись статьи про попытки заменить Алису на полностью локальные решения. Но почти всегда все сводилось к стандартной схеме: ESP32-микрофон → Home Assistant → intent recognition . Такая связка работает, но до действительно «умного» ассистента ей далеко. Мы пошли дальше и собрали свой голосовой ассистент, о котором расскажем в статье.
https://habr.com/ru/companies/wirenboard/articles/965856/
#Wiren_Board #BARY #Алиса #голосовой_ассистент #распознавание_речи #vosk #Piper #Embedding #Wake_Word #умный_дом

#умный_дом #wake_word #embedding #piper #vosk #распознавание_речи
Habr @[email protected] · 2025-11-13 · 07:12 UTC

Без интернета и шпионов: как мы собрали локального голосового ассистента
Облачные ассистенты вроде Алисы , Google Assistant и Siri давно стали привычными. Но у всех у них одни и те же слабые места: зависимость от быстрого интернета и риск утечки данных. И речь не только о персональной информации — дома нередко обсуждают темы, которые можно отнести к коммерческой или даже военной тайне. Неудивительно, что многим некомфортно говорить в присутствии микрофона, который каждое слово отправляет куда-то «в облако» (один из наших заказчиков прямо сказал: «никаких Алис в доме не будет») . На Хабре уже появлялись статьи про попытки заменить Алису на полностью локальные решения. Но почти всегда все сводилось к стандартной схеме: ESP32-микрофон → Home Assistant → intent recognition . Такая связка работает, но до действительно «умного» ассистента ей далеко. Мы пошли дальше и собрали свой голосовой ассистент, о котором расскажем в статье.
https://habr.com/ru/companies/wirenboard/articles/965856/
#Wiren_Board #BARY #Алиса #голосовой_ассистент #распознавание_речи #vosk #Piper #Embedding #Wake_Word #умный_дом

#умный_дом #wake_word #embedding #piper #vosk #распознавание_речи
Habr @[email protected] · 2025-11-13 · 07:12 UTC

Без интернета и шпионов: как мы собрали локального голосового ассистента
Облачные ассистенты вроде Алисы , Google Assistant и Siri давно стали привычными. Но у всех у них одни и те же слабые места: зависимость от быстрого интернета и риск утечки данных. И речь не только о персональной информации — дома нередко обсуждают темы, которые можно отнести к коммерческой или даже военной тайне. Неудивительно, что многим некомфортно говорить в присутствии микрофона, который каждое слово отправляет куда-то «в облако» (один из наших заказчиков прямо сказал: «никаких Алис в доме не будет») . На Хабре уже появлялись статьи про попытки заменить Алису на полностью локальные решения. Но почти всегда все сводилось к стандартной схеме: ESP32-микрофон → Home Assistant → intent recognition . Такая связка работает, но до действительно «умного» ассистента ей далеко. Мы пошли дальше и собрали свой голосовой ассистент, о котором расскажем в статье.
https://habr.com/ru/companies/wirenboard/articles/965856/
#Wiren_Board #BARY #Алиса #голосовой_ассистент #распознавание_речи #vosk #Piper #Embedding #Wake_Word #умный_дом

#wiren_board #bary #алиса #голосовой_ассистент #распознавание_речи #vosk
Carolina Code Conference @[email protected] · 2025-11-06 · 15:41 UTC

Vector Search Magic: Superhero & Sporty Car Combo! #shorts: By using vector math, one can combine a DeLorean image with the text query for superhero to generate an embedding. Averaging these embeddings together then allows a search to find a superhero on top of a sporty car with cool lights. #vectormath #image #superhero #DeLorean #embedding https://www.youtube.com/shorts/RMX2A02pP90

#shorts #vectormath #image #superhero #delorean #embedding
Sara Zan @[email protected] · 2025-11-04 · 16:19 UTC

We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. Then why is it still used in so many modern search pipelines, especially for RAG?
In this post I'll explain you what hybrid search is and why keyword search is still so useful to improve your search results.
https://www.zansara.dev/posts/2025-11-04-hybrid-retrieval/
#AI #GenAI #LLMs #BM25 #Embedding #Retrieval #RAG

#ai #genai #llms #bm25 #embedding #retrieval