#inference — Public Fediverse posts on home.social

h o ʍ l e t t @[email protected] · 2026-05-14 · 15:44 UTC

→ Friends Don't Let Friends Use Ollama
https://sleepingrobots.com/dreams/stop-using-ollama/

“#Ollama’s entire inference capability comes from llama.cpp, the C++ #inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer #laptops at all, he hacked together the first version in an evening, and it kicked off the entire #local LLM movement. […] It’s truly #community-driven, #MIT-licensed, and under active development with 450+ #contributors.”

#LLM

#ollama #inference #laptops #local #community #mit

h o ʍ l e t t @[email protected] · 2026-05-14 · 15:44 UTC

→ Friends Don't Let Friends Use Ollama
https://sleepingrobots.com/dreams/stop-using-ollama/

“#Ollama’s entire inference capability comes from llama.cpp, the C++ #inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer #laptops at all, he hacked together the first version in an evening, and it kicked off the entire #local LLM movement. […] It’s truly #community-driven, #MIT-licensed, and under active development with 450+ #contributors.”

#LLM

#ollama #inference #laptops #local #community #mit

h o ʍ l e t t @[email protected] · 2026-05-14 · 15:44 UTC

→ Friends Don't Let Friends Use Ollama
https://sleepingrobots.com/dreams/stop-using-ollama/

“#Ollama’s entire inference capability comes from llama.cpp, the C++ #inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer #laptops at all, he hacked together the first version in an evening, and it kicked off the entire #local LLM movement. […] It’s truly #community-driven, #MIT-licensed, and under active development with 450+ #contributors.”

#LLM

#ollama #inference #laptops #local #community #mit

h o ʍ l e t t @[email protected] · 2026-05-14 · 15:44 UTC

→ Friends Don't Let Friends Use Ollama
https://sleepingrobots.com/dreams/stop-using-ollama/

“#Ollama’s entire inference capability comes from llama.cpp, the C++ #inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer #laptops at all, he hacked together the first version in an evening, and it kicked off the entire #local LLM movement. […] It’s truly #community-driven, #MIT-licensed, and under active development with 450+ #contributors.”

#LLM

#llm #contributors #mit #community #local #laptops

h o ʍ l e t t @[email protected] · 2026-05-14 · 15:44 UTC

→ Friends Don't Let Friends Use Ollama
https://sleepingrobots.com/dreams/stop-using-ollama/

“#Ollama’s entire inference capability comes from llama.cpp, the C++ #inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer #laptops at all, he hacked together the first version in an evening, and it kicked off the entire #local LLM movement. […] It’s truly #community-driven, #MIT-licensed, and under active development with 450+ #contributors.”

#LLM

#ollama #inference #laptops #local #community #mit

Ireland @[email protected] · 2026-05-09 · 15:56 UTC

https://www.europesays.com/ie/476546/ Why Memory Stocks Are the Tech Sector’s Hottest Trade #AI #ArtificialIntelligence #ArtificialIntelligence #Company #ConsumerDevice #DaveMazza #demand #Éire #firm #hyperscaler #IE #inference #Ireland #market #memory #MemoryChip #Micron #SanDisk #StorageComponent #supply #Technology

#technology #supply #storagecomponent #sandisk #micron #memorychip

Hacker News @[email protected] · 2026-05-07 · 17:24 UTC

DeepSeek 4 Flash local inference engine for Metal

https://github.com/antirez/ds4

#HackerNews #DeepSeek #Flash #Metal #Inference #Engine #AI #Technology

#hackernews #deepseek #flash #metal #inference #engine

The Recursive: Stories shape stories [Unofficial] @[email protected] · 2026-05-05 · 08:34 UTC

Why 500 Global and Nvidia Just Bet €91.5m on Deepinfra’s ‘Token Factory’

https://web.brid.gy/r/https://therecursive.com/deepinfra-series-b-500-global-nvidia-ai-inference-infrastructure/

#ai #cloud #deals #diaspora #news #deepinfra

Habr @[email protected] · 2026-05-01 · 11:52 UTC

Гайды по nxs-universal-chart v3.0: AI Inference контур на основе KServe

Итак, вы обучили модель и она показывает ожидаемые результаты. Теперь осталось выкатить её на контур, однако для этого необходим ряд компонентов: нужна маршрутизация трафика, непосредственно инференс. Желателен autoscaling модели, передача чувствительных данных, например креды до хранилища моделей. Ну и мониторинг не помешал бы. Каждый компонент - это отдельный Helm-чарт, отдельные CRD и отдельная документация. В итоге, вместо быстрого тестирования модели и гипотез, приходится заниматься YAML-инжинирингом и громко ругаться благим матом. Всем привет, на связи Пётр, инженер

https://habr.com/ru/articles/1030440/

#devops #kubernetes #mlops #helm #kserve #istio #machine_learning #inference #ai #deploy

#deploy #ai #inference #machine_learning #istio #kserve

Habr @[email protected] · 2026-05-01 · 11:52 UTC

Гайды по nxs-universal-chart v3.0: AI Inference контур на основе KServe

Итак, вы обучили модель и она показывает ожидаемые результаты. Теперь осталось выкатить её на контур, однако для этого необходим ряд компонентов: нужна маршрутизация трафика, непосредственно инференс. Желателен autoscaling модели, передача чувствительных данных, например креды до хранилища моделей. Ну и мониторинг не помешал бы. Каждый компонент - это отдельный Helm-чарт, отдельные CRD и отдельная документация. В итоге, вместо быстрого тестирования модели и гипотез, приходится заниматься YAML-инжинирингом и громко ругаться благим матом. Всем привет, на связи Пётр, инженер

https://habr.com/ru/articles/1030440/

#devops #kubernetes #mlops #helm #kserve #istio #machine_learning #inference #ai #deploy

#deploy #ai #inference #machine_learning #istio #kserve

Habr @[email protected] · 2026-05-01 · 11:52 UTC

Гайды по nxs-universal-chart v3.0: AI Inference контур на основе KServe

Итак, вы обучили модель и она показывает ожидаемые результаты. Теперь осталось выкатить её на контур, однако для этого необходим ряд компонентов: нужна маршрутизация трафика, непосредственно инференс. Желателен autoscaling модели, передача чувствительных данных, например креды до хранилища моделей. Ну и мониторинг не помешал бы. Каждый компонент - это отдельный Helm-чарт, отдельные CRD и отдельная документация. В итоге, вместо быстрого тестирования модели и гипотез, приходится заниматься YAML-инжинирингом и громко ругаться благим матом. Всем привет, на связи Пётр, инженер

https://habr.com/ru/articles/1030440/

#devops #kubernetes #mlops #helm #kserve #istio #machine_learning #inference #ai #deploy

#deploy #ai #inference #machine_learning #istio #kserve

Habr @[email protected] · 2026-05-01 · 11:52 UTC

Гайды по nxs-universal-chart v3.0: AI Inference контур на основе KServe

Итак, вы обучили модель и она показывает ожидаемые результаты. Теперь осталось выкатить её на контур, однако для этого необходим ряд компонентов: нужна маршрутизация трафика, непосредственно инференс. Желателен autoscaling модели, передача чувствительных данных, например креды до хранилища моделей. Ну и мониторинг не помешал бы. Каждый компонент - это отдельный Helm-чарт, отдельные CRD и отдельная документация. В итоге, вместо быстрого тестирования модели и гипотез, приходится заниматься YAML-инжинирингом и громко ругаться благим матом. Всем привет, на связи Пётр, инженер

https://habr.com/ru/articles/1030440/

#devops #kubernetes #mlops #helm #kserve #istio #machine_learning #inference #ai #deploy

#devops #kubernetes #mlops #helm #kserve #istio

WIST Quotations @[email protected] · 2026-04-16 · 21:27 UTC

A quotation from Arthur Conan Doyle

No, no: I never guess. It is a shocking habit, — destructive to the logical faculty.

Arthur Conan Doyle (1859-1930) British writer and physician
Story (1890-02), “The Sign of the Four,” ch. 1 [Holmes], Lippincott’s Monthly Magazine, Vol. 45 (US) / 1 (UK)

More about this quote: wist.info/doyle-arthur-conan/8…

#quote #quotes #quotation #qotd #arthurconandoyle #sherlock #holmes #sherlockholmes #deduction #discipline #guess #guesswork #inference #logic

#quote #quotes #qotd #quotation #guess #logic

WIST Quotations @[email protected] · 2026-04-16 · 21:27 UTC

A quotation from Arthur Conan Doyle

No, no: I never guess. It is a shocking habit, — destructive to the logical faculty.

Arthur Conan Doyle (1859-1930) British writer and physician
Story (1890-02), “The Sign of the Four,” ch. 1 [Holmes], Lippincott’s Monthly Magazine, Vol. 45 (US) / 1 (UK)

More about this quote: wist.info/doyle-arthur-conan/8…

#quote #quotes #quotation #qotd #arthurconandoyle #sherlock #holmes #sherlockholmes #deduction #discipline #guess #guesswork #inference #logic

#quote #quotes #qotd #quotation #guess #logic

WIST Quotations @[email protected] · 2026-04-16 · 21:27 UTC

A quotation from Arthur Conan Doyle

No, no: I never guess. It is a shocking habit, — destructive to the logical faculty.

Arthur Conan Doyle (1859-1930) British writer and physician
Story (1890-02), “The Sign of the Four,” ch. 1 [Holmes], Lippincott’s Monthly Magazine, Vol. 45 (US) / 1 (UK)

More about this quote: wist.info/doyle-arthur-conan/8…

#quote #quotes #quotation #qotd #arthurconandoyle #sherlock #holmes #sherlockholmes #deduction #discipline #guess #guesswork #inference #logic

#quote #quotes #qotd #quotation #guess #logic

WIST Quotations @[email protected] · 2026-04-16 · 21:27 UTC

A quotation from Arthur Conan Doyle

No, no: I never guess. It is a shocking habit, — destructive to the logical faculty.

Arthur Conan Doyle (1859-1930) British writer and physician
Story (1890-02), “The Sign of the Four,” ch. 1 [Holmes], Lippincott’s Monthly Magazine, Vol. 45 (US) / 1 (UK)

More about this quote: wist.info/doyle-arthur-conan/8…

#quote #quotes #quotation #qotd #arthurconandoyle #sherlock #holmes #sherlockholmes #deduction #discipline #guess #guesswork #inference #logic

#guesswork #inference #deduction #holmes #discipline #sherlock

Giovanni Crisalfi @gicrisf · 2026-03-29 · 15:48 UTC

After A LOT of studying BLAS internals, my PR to the gemm crate is finally open: it introduces mixed-precision BF16 matmuls (optimal for use cases like small models doing autoregressive decoding on CPU)

https://github.com/sarah-quinones/gemm/pull/40

#programming #rust #ai #inference #deeplearning #qwen #asr #opensource #rustlang

#programming #rust #ai #inference #deeplearning #qwen

ALEXBSR @alexbsr · 2026-03-21 · 22:12 UTC

As local AI adoption accelerates, traditional cloud-only inference is no longer sufficient. This article explores how hybrid inference architecture—combining local models with cloud-scale intelligence—enables a new paradigm: the “token factory.”

Instead of treating AI as a monolithic service, this approach distributes token generation across edge devices and centralized systems, optimizing for latency, cost, and scalability. Local models handle high-throughput, low-latency token production, while larger models refine outputs only when necessary—dramatically reducing compute overhead and enabling real-time AI at scale.

With enterprises facing rising inference costs and privacy constraints, hybrid architectures are emerging as a practical solution—delivering near cloud-level performance while maintaining control over data and infrastructure.

https://www.buysellram.com/blog/hybrid-inference-architecture-why-the-token-factory-scales-as-local-ai-explodes/

#AIInfrastructure #NVIDIA #GTC2026 #HybridAI #GPU #DataCenter #Inference #ITAD #AgenticAI #LocalAIInference #TokenFactory #OnPremiseAI

#aiinfrastructure #nvidia #gtc2026 #hybridai #gpu #datacenter

deepseek @[email protected] · 2026-03-20 · 19:21 UTC

The Vanishing Cost of Intelligence: Why Box’s Aaron Levie Thinks AI Will Be Nearly Free by 2026 Box CEO Aaron Levie predicts AI token costs will approach zero by 2026, a claim with massive implic...

#AITrends #CloudWorkPro #Aaron #Levie #Box #AI #inference #economics #AI #token #costs

Origin | Interest | Match

#aitrends #cloudworkpro #aaron #levie #box #ai