#vectorization — Public Fediverse posts on home.social

Habr @[email protected] · 2026-03-10 · 12:22 UTC

От OCR до ADE: как машины научились не просто читать, а понимать документы

Ещё 10 лет назад машина видела в документе просто набор пикселей. Сегодня она понимает структуру страницы, читает таблицы, графики и рукописи — и автоматически извлекает нужные данные. Разбираем как это работает под капотом и почему это меняет целые индустрии.

https://habr.com/ru/articles/1008610/

#OCR #машинное_обучение #обработка_документов #LLM #RAG #Python #компьютерное_зрение #computer_vision #aiагенты #vectorization

#vectorization #aiагенты #computer_vision #компьютерное_зрение #python #rag

Habr @[email protected] · 2026-03-10 · 12:22 UTC

От OCR до ADE: как машины научились не просто читать, а понимать документы

Ещё 10 лет назад машина видела в документе просто набор пикселей. Сегодня она понимает структуру страницы, читает таблицы, графики и рукописи — и автоматически извлекает нужные данные. Разбираем как это работает под капотом и почему это меняет целые индустрии.

https://habr.com/ru/articles/1008610/

#OCR #машинное_обучение #обработка_документов #LLM #RAG #Python #компьютерное_зрение #computer_vision #aiагенты #vectorization

#vectorization #aiагенты #computer_vision #компьютерное_зрение #python #rag

Habr @[email protected] · 2026-03-10 · 12:22 UTC

От OCR до ADE: как машины научились не просто читать, а понимать документы

Ещё 10 лет назад машина видела в документе просто набор пикселей. Сегодня она понимает структуру страницы, читает таблицы, графики и рукописи — и автоматически извлекает нужные данные. Разбираем как это работает под капотом и почему это меняет целые индустрии.

https://habr.com/ru/articles/1008610/

#OCR #машинное_обучение #обработка_документов #LLM #RAG #Python #компьютерное_зрение #computer_vision #aiагенты #vectorization

#vectorization #aiагенты #computer_vision #компьютерное_зрение #python #rag

Habr @[email protected] · 2026-03-10 · 12:22 UTC

От OCR до ADE: как машины научились не просто читать, а понимать документы

Ещё 10 лет назад машина видела в документе просто набор пикселей. Сегодня она понимает структуру страницы, читает таблицы, графики и рукописи — и автоматически извлекает нужные данные. Разбираем как это работает под капотом и почему это меняет целые индустрии.

https://habr.com/ru/articles/1008610/

#OCR #машинное_обучение #обработка_документов #LLM #RAG #Python #компьютерное_зрение #computer_vision #aiагенты #vectorization

#ocr #машинное_обучение #обработка_документов #llm #rag #python

pgEdge Postgres @[email protected] · 2026-02-02 · 16:30 UTC

If you're located near Illinois, Shaun Thomas will be presenting on "The New Postgres AI Ecosystem" at the Illinois Prairie PostgreSQL User Group this February 18th at 5:30 PM CST. 🐘

Come by the DRW and say hi: https://www.meetup.com/illinois-prairie-postgresql-user-group/events/312929674/

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization #aidev #illinois #chicago

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization

pgEdge Postgres @[email protected] · 2026-02-02 · 16:30 UTC

If you're located near Illinois, Shaun Thomas will be presenting on "The New Postgres AI Ecosystem" at the Illinois Prairie PostgreSQL User Group this February 18th at 5:30 PM CST. 🐘

Come by the DRW and say hi: https://www.meetup.com/illinois-prairie-postgresql-user-group/events/312929674/

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization #aidev #illinois #chicago

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization

pgEdge Postgres @[email protected] · 2026-02-02 · 16:30 UTC

If you're located near Illinois, Shaun Thomas will be presenting on "The New Postgres AI Ecosystem" at the Illinois Prairie PostgreSQL User Group this February 18th at 5:30 PM CST. 🐘

Come by the DRW and say hi: https://www.meetup.com/illinois-prairie-postgresql-user-group/events/312929674/

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization #aidev #illinois #chicago

#chicago #illinois #aidev #vectorization #pgvector #vectordatabase

pgEdge Postgres @[email protected] · 2026-02-02 · 16:30 UTC

If you're located near Illinois, Shaun Thomas will be presenting on "The New Postgres AI Ecosystem" at the Illinois Prairie PostgreSQL User Group this February 18th at 5:30 PM CST. 🐘

Come by the DRW and say hi: https://www.meetup.com/illinois-prairie-postgresql-user-group/events/312929674/

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization #aidev #illinois #chicago

#postgresql #postgres #ai #vectordatabase #pgvector #vectorization

Habr @[email protected] · 2025-11-15 · 17:02 UTC

Собираем простейшую RAG-систему на PHP с фреймворком Neuron AI за вечер

RAG (Retrieval-Augmented Generation или генерация, дополненная поиском) - это метод искусственного интеллекта, сочетающий генеративную большую языковую модель (LLM) с внешней базой знаний для создания более точных, контекстно-зависимых и актуальных ответов. Принцип его работы заключается в том, что сначала извлекается релевантная информация из набора документов или источников данных, а затем эта информация передается в LLM для формирования окончательного ответа. Этот процесс позволяет модели выдавать более точные ответы, менее подверженные “галлюцинациям”, и ее можно обновлять без дорогостоящего переобучения. Сегодня мы разберёмся, как собрать базовую RAG-систему на PHP (да, да, не надо удивляться) с помощью фреймворка Neuron AI . Это будет наш маленький proof-of-concept - минимально работающий, но вполне реальный пример. Ну что, начнём генерацию?

https://habr.com/ru/articles/966792/

#rag #rag_ai #php #llm #llmагент #rag_api #vectorization #embeddings #neuron

#neuron #embeddings #vectorization #rag_api #llmагент #llm

N-gated Hacker News @[email protected] · 2025-07-07 · 12:34 UTC

Ah, the #tangled #web of #SIMD vector functions! 🤯 Who knew optimizing #code could be so messy, like trying to untangle your headphones while wrestling a grizzly bear 🐻. But fear not, a #workshop in Aurora promises to save the day, because nothing says "fun weekend" like #vectorization with strangers! 🎉
https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/ #optimization #fun #weekend #HackerNews #ngated

#tangled #web #simd #code #workshop #vectorization

Harald Klinke @[email protected] · 2025-06-03 · 19:03 UTC

I thoroughly enjoyed Antonio Somaini’s lecture tonight on the Politics of Latent Spaces at the conference Art in the Age of Average. The new AI-thoritarians.

His reflections on compression as a cultural and epistemic process were truly inspiring — and the sources cited were excellent, too ;)

#AI #LatentSpaces #DigitalCulture #Compression #Vectorization #NeuralNetworks #ArtAndAI #MachineVision #epistemiccompression #AIAesthetics @databasecultures

#ai #latentspaces #digitalculture #compression #vectorization #neuralnetworks

Habr @[email protected] · 2025-04-21 · 11:32 UTC

RISC-V: векторное расширение и алгоритм Витерби

Недавняя публикация о векторном расширении RISC-V архитектуры, подтолкнула меня к мысли написать небольшую заметку об использовании данного расширения в задаче, имеющей практическое применение. После появления векторного расширения, в сети начали публиковаться статьи о применении RISC-V ядер с данным расширением в задачах, ранее в которых безальтернативно использовались только процессоры ЦОС. В данной статье рассматривается тест, в котором используется алгоритм декодирования Витерби - задача, требующая значительных вычислительных ресурсов.

https://habr.com/ru/articles/902744/

#riscv #vectorization

arya dradjica (find me at RustWeek!!!) @[email protected] · 2025-03-17 · 22:02 UTC

Why in the world does VADDPD (floating-point addition) have a worse throughput than VFMADD132PD (floating-point multiplication and addition) on 2014 Intel Haswell chips
I might genuinely start performing a fused multiply by 1.0 in order to speed my code up

#simd #vectorization

Pierre Huyghebaert @[email protected] · 2025-03-07 · 15:33 UTC

@dmian @papernoise Cool to see that comparison, and details. You can even try by pushing the "smooth corners" to 1.34, it will produce curves without any kinks or cusps. See that very old article https://drawingcurved.osp.kitchen/Potrace_alphamax_1-334.xhtml - As far as I know, the #vectorization in #inkscape is using #potrace, probably the last version 1.16 so it has not change for years.

#vectorization #inkscape #potrace

arya dradjica (find me at RustWeek!!!) @[email protected] · 2024-11-13 · 02:16 UTC

Can somebody teach the LLVM autovectorizer pshufb please
Or if it already knows it, please share an example that generates it

#vectorization #simd #intel

Habr @[email protected] · 2024-11-05 · 08:22 UTC

Что ищет он в краю далёком? Как найти смысл жизни с PostgreSQL

Эта статья родилась из пары лекций, которые я прочитал студентам в рамках курса, посвященного вопросам машинного обучения. Почему именно PostgreSQL? Почему векторы? За последние два года тема языковых моделей стала невероятно популярной, и вместе с этим появилось множество инструментов, доступных даже начинающему инженеру, стремящемуся познакомиться с миром текстового анализа. Доступность этих технологий открывает безграничные возможности для их применения в самых разных областях: от систем управления знаниями до «копилотов», помогающих более тщательно анализировать анамнез пациентов, или информационных киосков, позволяющих собрать идеальную корзину товаров для пикника. Вряд ли данная работа может похвастаться полнотой или глубиной, однако, я надеюсь, что она предоставит те самые “хорошие” точки входа, которые позволят, погружаясь в детали, открыть для себя множество новых интересных и полезных тем для исследований и инженерных проектов. Откроем скрытые смыслы

https://habr.com/ru/articles/855712/

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search #hnsw #python #java #Knowledge_Management_Systems

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search

Habr @[email protected] · 2024-11-05 · 08:22 UTC

Что ищет он в краю далёком? Как найти смысл жизни с PostgreSQL

Эта статья родилась из пары лекций, которые я прочитал студентам в рамках курса, посвященного вопросам машинного обучения. Почему именно PostgreSQL? Почему векторы? За последние два года тема языковых моделей стала невероятно популярной, и вместе с этим появилось множество инструментов, доступных даже начинающему инженеру, стремящемуся познакомиться с миром текстового анализа. Доступность этих технологий открывает безграничные возможности для их применения в самых разных областях: от систем управления знаниями до «копилотов», помогающих более тщательно анализировать анамнез пациентов, или информационных киосков, позволяющих собрать идеальную корзину товаров для пикника. Вряд ли данная работа может похвастаться полнотой или глубиной, однако, я надеюсь, что она предоставит те самые “хорошие” точки входа, которые позволят, погружаясь в детали, открыть для себя множество новых интересных и полезных тем для исследований и инженерных проектов. Откроем скрытые смыслы

https://habr.com/ru/articles/855712/

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search #hnsw #python #java #Knowledge_Management_Systems

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search

Habr @[email protected] · 2024-11-05 · 08:22 UTC

Что ищет он в краю далёком? Как найти смысл жизни с PostgreSQL

Эта статья родилась из пары лекций, которые я прочитал студентам в рамках курса, посвященного вопросам машинного обучения. Почему именно PostgreSQL? Почему векторы? За последние два года тема языковых моделей стала невероятно популярной, и вместе с этим появилось множество инструментов, доступных даже начинающему инженеру, стремящемуся познакомиться с миром текстового анализа. Доступность этих технологий открывает безграничные возможности для их применения в самых разных областях: от систем управления знаниями до «копилотов», помогающих более тщательно анализировать анамнез пациентов, или информационных киосков, позволяющих собрать идеальную корзину товаров для пикника. Вряд ли данная работа может похвастаться полнотой или глубиной, однако, я надеюсь, что она предоставит те самые “хорошие” точки входа, которые позволят, погружаясь в детали, открыть для себя множество новых интересных и полезных тем для исследований и инженерных проектов. Откроем скрытые смыслы

https://habr.com/ru/articles/855712/

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search #hnsw #python #java #Knowledge_Management_Systems

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search

Habr @[email protected] · 2024-11-05 · 08:22 UTC

Что ищет он в краю далёком? Как найти смысл жизни с PostgreSQL

Эта статья родилась из пары лекций, которые я прочитал студентам в рамках курса, посвященного вопросам машинного обучения. Почему именно PostgreSQL? Почему векторы? За последние два года тема языковых моделей стала невероятно популярной, и вместе с этим появилось множество инструментов, доступных даже начинающему инженеру, стремящемуся познакомиться с миром текстового анализа. Доступность этих технологий открывает безграничные возможности для их применения в самых разных областях: от систем управления знаниями до «копилотов», помогающих более тщательно анализировать анамнез пациентов, или информационных киосков, позволяющих собрать идеальную корзину товаров для пикника. Вряд ли данная работа может похвастаться полнотой или глубиной, однако, я надеюсь, что она предоставит те самые “хорошие” точки входа, которые позволят, погружаясь в детали, открыть для себя множество новых интересных и полезных тем для исследований и инженерных проектов. Откроем скрытые смыслы

https://habr.com/ru/articles/855712/

#postgresql #postgres #pgvector #vectorization #fulltextsearch #fulltext_search #hnsw #python #java #Knowledge_Management_Systems

#knowledge_management_systems #java #python #hnsw #fulltext_search #fulltextsearch

arya dradjica (find me at RustWeek!!!) @[email protected] · 2024-09-29 · 18:00 UTC

I have so many good ideas for polyfilling SIMD instructions in older instruction sets and I don't know how to put them in my library properly

You want PCMPEQQ but don't have SSE4.1? No worries, do a PCMPEQD, use PSHUFD to swap pairs of dwords, then PAND with the original result (3 cycles).

PSRLB? Just PSRLD and PAND out some bits (2 cycles).

PSRAQ? Use PSRLQ, and OR the result with the negation of the shifted MSB (i.e. PSRLQ, PAND, PSUB, POR, 4 cycles). For PSRAB, do the same, but do an additional PAND (concurrently with the PAND / PSUB) to mask out overlapping high bits.

Want a VPOPCNTB for cheap? Perform two PSHUFBs (one on the low bits, one on the high bits, both with masking) to popcount nibbles and add their results. Even older CPUs should be able to do that in 3 cycles. For VPLZCNTB / VPTZCNTB, use PMIN/PMAX instead of adding the results.

#npsimd #intel #simd #vectorization

arya dradjica (find me at RustWeek!!!) @[email protected] · 2024-09-26 · 10:27 UTC

Why does _mm256_andnot_si256() compute (NOT a) AND b instead of a AND (NOT b)
Just ... why

#intel #vectorization

Arav K. @[email protected] · 2024-08-10 · 07:07 UTC

vectorized prefix ~sum~ function composition
send help

#vectorization #simd

Arav K. @[email protected] · 2024-08-05 · 19:55 UTC

AVX2 tip! `PANDN(x, PCMPEQB(y, 0))` where the MSB of `y` is always unset can be transformed into `PSIGN(y, x)`. If you want to mask some elements `x` based on whether an input `y` is non-zero, and the MSB of `y` is always unset, you can multiply `x` by the sign of `y` (which will be 0 or 1) in 1 cycle using `PSIGN`. I think this is actually a pretty common pattern, but compilers can't really see it because of the MSB check.

#vectorization #simd #intel #avx2

Arav K. @[email protected] · 2024-07-17 · 16:46 UTC

Version 0.2.0 of `npsimd` is now published, with a new low-level API that supports runtime feature detection (currently only SSE2 is implemented). I'm going to slowly migrate all the existing functionality over to it, and then work on a better higher-level API. See <https://docs.rs/npsimd>!

#rust #vectorization #npsimd

Arav K. @[email protected] · 2024-07-17 · 06:30 UTC

I'm working on a Rust library for explicitly *non-portable* SIMD intrinsics, called `npsimd` <https://docs.rs/npsimd>. The idea is that high-performance vectorized code needs to take advantage of platform-specific functionality, and we need a good way to write such code in Rust. I hope this library can provide an API to do that.

#rust #vectorization

Arav K. @[email protected] · 2024-07-07 · 17:14 UTC

Given a bit-mask with consecutive sequences of set elements, if you want to unset an entire sequence based on the first (least-significant) element, calculate a 1 for every first element to be masked out, add it to the original bit-mask, then AND it with the original bit-mask. This is very helpful for vectorized lexing!

#vectorization

C++ on Sea @[email protected] · 2024-06-06 · 21:14 UTC

C++OnSea 2024 SESSION ANNOUNCEMENT: Being Friendly to Your Hardware by Ignas Bagdonas

https://cpponsea.uk/2024/sessions/being-friendly-to-your-hardware

Register now at https://cpponsea.uk/tickets/

#vectorization #cpp #cplusplus #coding

Arav K. @[email protected] · 2024-05-28 · 13:43 UTC

AVX-512's VPCOMPRESS instruction is so damn cool. For a simple array filtering problem (retain in-place only even 32-bit numbers out of a 256MiB array), it'll out-perform native C code by a factor of 10x. The C code executes at about 800MHz, while the AVX512 code executes at about 90MHz - it's just 100 times more productive with the cycles it executes.

#avx512 #vectorization

C++ on Sea @[email protected] · 2024-05-16 · 16:05 UTC

C++OnSea 2024 SESSION ANNOUNCEMENT: Being Friendly to Your Hardware by Ignas Bagdonas

https://cpponsea.uk/2024/sessions/being-friendly-to-your-hardware

Register now at https://cpponsea.uk/tickets/

#vectorization #cpp #cplusplus #coding

Arav K. @[email protected] · 2024-05-13 · 09:33 UTC

Holy shit, `VPSHLDQ` is so cool! On my laptop, the scalar `SHLD` on 64-bit GPRs is 1-3 cycles of latency. `VPSHLDQ` does the same thing (with a constant shift value, which is fine for my use case) on a 8x64-bit ZMM register with just 1 cycle of latency. I can perform the same operation 8-24 times faster!

#simd #vectorization #avx512

Antão Almada @[email protected] · 2024-04-09 · 22:52 UTC

I have now released one more major version of my vectorization library. The breaking changes are worth it, supporting more operations without technical dept. It now includes vectorized First() and IndexOfFirst() operations.
https://netfabric.github.io/NetFabric.Numerics.Tensors/articles/intro.html
#dotnet #simd #vectorization

#dotnet #simd #vectorization

Niklas Alt @[email protected] · 2023-10-10 · 05:36 UTC

Hi fediverse, is anyone aware of #opensource pipelines for #segmentation / #vectorization of #historical #cadastral maps? Ideally a workflow to train #AI / #ML models on specific mapsets, e.g. the new prussian survey after 1870*, the francisceian (mid 19th century) or the bavarian*** to only mention the largest surveys in central Europe. I suspect that people outside of history are working on it, these maps are a true treasure for #environmental and #biodiversity research. Links are in the reply

#opensource #segmentation #vectorization #historical #cadastral #ai

Niklas Alt @[email protected] · 2023-10-10 · 05:36 UTC

Hi fediverse, is anyone aware of #opensource pipelines for #segmentation / #vectorization of #historical #cadastral maps? Ideally a workflow to train #AI / #ML models on specific mapsets, e.g. the new prussian survey after 1870*, the francisceian (mid 19th century) or the bavarian*** to only mention the largest surveys in central Europe. I suspect that people outside of history are working on it, these maps are a true treasure for #environmental and #biodiversity research. Links are in the reply

#opensource #segmentation #vectorization #historical #cadastral #ai

Niklas Alt @[email protected] · 2023-10-10 · 05:36 UTC

Hi fediverse, is anyone aware of #opensource pipelines for #segmentation / #vectorization of #historical #cadastral maps? Ideally a workflow to train #AI / #ML models on specific mapsets, e.g. the new prussian survey after 1870*, the francisceian (mid 19th century) or the bavarian*** to only mention the largest surveys in central Europe. I suspect that people outside of history are working on it, these maps are a true treasure for #environmental and #biodiversity research. Links are in the reply

#opensource #segmentation #vectorization #historical #cadastral #ai

Niklas Alt @[email protected] · 2023-10-10 · 05:36 UTC

Hi fediverse, is anyone aware of #opensource pipelines for #segmentation / #vectorization of #historical #cadastral maps? Ideally a workflow to train #AI / #ML models on specific mapsets, e.g. the new prussian survey after 1870*, the francisceian (mid 19th century) or the bavarian*** to only mention the largest surveys in central Europe. I suspect that people outside of history are working on it, these maps are a true treasure for #environmental and #biodiversity research. Links are in the reply

#biodiversity #environmental #ml #ai #cadastral #historical

Niklas Alt @[email protected] · 2023-10-10 · 05:36 UTC

Hi fediverse, is anyone aware of #opensource pipelines for #segmentation / #vectorization of #historical #cadastral maps? Ideally a workflow to train #AI / #ML models on specific mapsets, e.g. the new prussian survey after 1870*, the francisceian (mid 19th century) or the bavarian*** to only mention the largest surveys in central Europe. I suspect that people outside of history are working on it, these maps are a true treasure for #environmental and #biodiversity research. Links are in the reply

#opensource #segmentation #vectorization #historical #cadastral #ai