home.social

#localai — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #localai, aggregated by home.social.

  1. RT @stableAPY: Ich kann immer noch nicht fassen, dass meine 3060 12GB Qwen 3.6 35B mit 40 tok/s ausführt. Diese Karte kostet gebraucht etwa 200$, während alle über extrem teure 128GB Unified Memory oder RTX 6000-Karten schwärmen. Eine einzelne 3060 12GB kann für erste lokale KI-Experimente weit von ausreichend entfernt sein – sie ist günstig und in Kombination mit etwas RAM und einem einigermaßen ordentlichen CPU leistet sie ihren Dienst. Natürlich gibt es Decode-Einbrüche, wenn der Kontext wächst, und man kann keine mehreren Sub-Agents gleichzeitig ausführen, aber es ist ein günstiger Einstiegspunkt. Zum Beispiel paart sie sich sehr gut mit meiner 3090: 3090 läuft als Main-Agent 35B -np 2 = so kann ich 2 parallele Agents haben 3060 als Sub-Agent 35B -np1 Auf diese Weise kann mein Main-Hermes Arbeit an diesen Sub-Agent delegieren, während er an etwas anderem arbeitet. Ich führe auch einen Hermes-Cron-Job aus, damit sie den Main-Agent nicht überlasten, und es stört mich nicht, dass es langsamer ist, weil es im Hintergrund passiert.

    mehr auf Arint.info

    #3060 #Hardware #KI #LocalAI #OpenSource #Qwen #arint_info

    https://x.com/stableAPY/status/2054846979755200583#m

  2. Fedora approved the AI Developer Desktop initiative to create AI-focused Atomic Desktop images with local-first tooling and no default cloud AI connections. 🤖
    Planned Fedora 45 releases include open-source AI images plus CUDA-based remixes for Intel, AMD, NVIDIA, and ARM hardware support. 🐧

    🔗 itsfoss.com/news/fedora-ai-dev

    #TechNews #Fedora #Ubuntu #Linux #AI #ArtificialIntelligence #OpenSource #Atomic #CUDA #Cloud #CloudAI #LocalAI #FOSS #NVIDIA #AMD #Intel #ARM #MachineLearning #Developers

  3. New week, small update: Run LLMs Locally

    Now with a new setup for OpenCode with Qwen 3.6 and Gemma 4, including permissions and thinking variants.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode

  4. Ars Technica: Chrome’s 4GB AI model isn’t new, but you’re not wrong for being confused. “Some desktop Chrome users have also noted that the browser appears to suddenly want more storage space for AI. This is true—Chrome does download a 4GB AI model for on-device processing. It’s been doing that for years, though. Google hasn’t actually changed anything about Chrome’s on-device AI, […]

    https://rbfirehose.com/2026/05/09/ars-technica-chromes-4gb-ai-model-isnt-new-but-youre-not-wrong-for-being-confused/
  5. Ars Technica: Chrome’s 4GB AI model isn’t new, but you’re not wrong for being confused. “Some desktop Chrome users have also noted that the browser appears to suddenly want more storage space for AI. This is true—Chrome does download a 4GB AI model for on-device processing. It’s been doing that for years, though. Google hasn’t actually changed anything about Chrome’s on-device AI, […]

    https://rbfirehose.com/2026/05/09/ars-technica-chromes-4gb-ai-model-isnt-new-but-youre-not-wrong-for-being-confused/
  6. Ars Technica: Chrome’s 4GB AI model isn’t new, but you’re not wrong for being confused. “Some desktop Chrome users have also noted that the browser appears to suddenly want more storage space for AI. This is true—Chrome does download a 4GB AI model for on-device processing. It’s been doing that for years, though. Google hasn’t actually changed anything about Chrome’s on-device AI, […]

    https://rbfirehose.com/2026/05/09/ars-technica-chromes-4gb-ai-model-isnt-new-but-youre-not-wrong-for-being-confused/
  7. Ars Technica: Chrome’s 4GB AI model isn’t new, but you’re not wrong for being confused. “Some desktop Chrome users have also noted that the browser appears to suddenly want more storage space for AI. This is true—Chrome does download a 4GB AI model for on-device processing. It’s been doing that for years, though. Google hasn’t actually changed anything about Chrome’s on-device AI, […]

    https://rbfirehose.com/2026/05/09/ars-technica-chromes-4gb-ai-model-isnt-new-but-youre-not-wrong-for-being-confused/
  8. Ars Technica: Chrome’s 4GB AI model isn’t new, but you’re not wrong for being confused. “Some desktop Chrome users have also noted that the browser appears to suddenly want more storage space for AI. This is true—Chrome does download a 4GB AI model for on-device processing. It’s been doing that for years, though. Google hasn’t actually changed anything about Chrome’s on-device AI, […]

    https://rbfirehose.com/2026/05/09/ars-technica-chromes-4gb-ai-model-isnt-new-but-youre-not-wrong-for-being-confused/
  9. How to Replace Siri with a Free Local Model

    Explain the difference between local AI and cloud AI in simple terms

    #LocalAI is processed on your device, keeping all data private.
    #CloudAI is processed on a server and requires internet access.

    app.therundown.ai/guides/how-t

    #LocallyAI #gemma #gemma4 #llm #ai

  10. A $1,999 Mac mini runs a 70B parameter model that a $4,000 Windows workstation physically cannot.
    The reason: Apple Silicon's unified memory. No separate VRAM pool. No PCIe bottleneck. Just one shared memory for CPU, GPU, and Neural Engine.
    Full breakdown: buysellram.com/blog/why-mac-mi

    #ArtificialIntelligence #AI #LocalAI #MacMini #AppleSilicon #LLM #AIAgents #MachineLearning #EdgeAI #TechInfrastructure #DataPrivacy #Automation #AIHardware

  11. A $1,999 Mac mini runs a 70B parameter model that a $4,000 Windows workstation physically cannot.
    The reason: Apple Silicon's unified memory. No separate VRAM pool. No PCIe bottleneck. Just one shared memory for CPU, GPU, and Neural Engine.
    Full breakdown: buysellram.com/blog/why-mac-mi

    #ArtificialIntelligence #AI #LocalAI #MacMini #AppleSilicon #LLM #AIAgents #MachineLearning #EdgeAI #TechInfrastructure #DataPrivacy #Automation #AIHardware

  12. A $1,999 Mac mini runs a 70B parameter model that a $4,000 Windows workstation physically cannot.
    The reason: Apple Silicon's unified memory. No separate VRAM pool. No PCIe bottleneck. Just one shared memory for CPU, GPU, and Neural Engine.
    Full breakdown: buysellram.com/blog/why-mac-mi

    #ArtificialIntelligence #AI #LocalAI #MacMini #AppleSilicon #LLM #AIAgents #MachineLearning #EdgeAI #TechInfrastructure #DataPrivacy #Automation #AIHardware

  13. The rise of local AI is changing hardware demand in unexpected ways — and the Mac Mini is emerging as one of the biggest winners.

    What makes it interesting is not just the compact form factor. Apple Silicon’s unified memory architecture, low power consumption, quiet operation, and ability to run AI workloads locally are making the Mac Mini increasingly attractive for developers, startups, and businesses building AI agents.

    Recent reports show that higher-memory Mac Mini configurations are experiencing major shortages as AI adoption accelerates.

    This article explores:
    • Why local AI agents are growing rapidly
    • How the Mac Mini became a practical AI workstation
    • The role of unified memory for LLM workloads
    • Why developers are moving away from cloud-only AI setups
    • What this trend means for future AI infrastructure

    buysellram.com/blog/why-mac-mi

    #ArtificialIntelligence #AI #LocalAI #MacMini #AppleSilicon #LLM #AIAgents #MachineLearning #EdgeAI #DataPrivacy #Automation #AIHardware #technology

  14. A $1,999 Mac mini runs a 70B parameter model that a $4,000 Windows workstation physically cannot.
    The reason: Apple Silicon's unified memory. No separate VRAM pool. No PCIe bottleneck. Just one shared memory for CPU, GPU, and Neural Engine.
    Full breakdown: buysellram.com/blog/why-mac-mi

    #ArtificialIntelligence #AI #LocalAI #MacMini #AppleSilicon #LLM #AIAgents #MachineLearning #EdgeAI #TechInfrastructure #DataPrivacy #Automation #AIHardware #tech

  15. New week, more slides: Run LLMs Locally

    Now with LFM 2 and new slides for using Transformers.js with WebGPU for Privacy Filter, Function Calling and Embeddings, running completely in your browser.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #gptoss #qwen3 #glm #localai #gemma4 #nemotron #webgpu

  16. If you are exploring AI 3D model generation for Godot and Unity, read on. Turning a text prompt or a single photo into a textured 3D model is now possible entirely on your own hardware. This guide will help you navigate the landscape, whether you use Godot, Unity, or both. We focus on free, locally runnable AI models and explain exactly which output formats they support, so you can build a seamless pipeline from generation to engine. […]

    https://blog.icod.de/2026/05/02/ai-3d-models-godot-unity-local/
  17. Ubuntu moves AI roadmap local-first using open-weight models and on-device inference via snaps instead of cloud-first copilots. 🐧

    Canonical frames AI as opt-in and sandboxed.
    Would you want AI features built into your OS like this, or kept separate? 🔒

    🔗 itsfoss.com/news/ubuntu-is-get

    #TechNews #Ubuntu #Linux #AI #ArtificialIntelligence #OpenSource #Privacy #LocalAI #Canonical #Snaps #MachineLearning #FOSS #OnDeviceAI #DigitalRights

  18. Has somebody researched #localAI #localLLM options for (agentic) coding?

    Der Standard / Daniel Koller published an extensive read, based on MacMini with 16GB RAM.

    Current price point for MacMini M4 Pro with 48GB oder 64GB RAM would be ~2'200 €.

    That's quite an investement, but if AI providers all switch to usage-based pricing, it might become a viable option? 🤔 Disclaimer: You won't get the same #ClaudeCode experience out of this (yet)? That's the bet one would make - am I getting this right?

    derstandard.at/story/300000031
    #vibecoding #AI

  19. New week, new slides: Run LLMs Locally

    Now including Nemotron 3 Nano Omni from Nvidia, Llama.cpp built-in tools and new slides about using Transformers.js with WebGPU for Image Recognition and OCR.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #gptoss #qwen3 #glm #localai #gemma4 #nemotron #webgpu

  20. You can use Gemma 4, the newly released #ai model by #google fully #local on your device. This means that, after the download, you dont need internet to use the AI and conversations are not send to google, which is a huge #privacy win.
    You can download the model via the edge gallery app without login.

    Im not associated with google in any way.

    Do you use AI local on your device?

    #gemma4 #googleai #localai #offlineai #PrivacyWins #Ai #dataprivacy #DataProtection #privateai

  21. RT @TheAhmadOsman: PRO-TIPP Mein Agent Web-Stack - SearXNG: Entdeckung potenzieller Quellen - Firecrawl: Scraping und Crawling bekannter URLs - Camofox: Browser-Fallback für JS/Interaktion Suchen - Extrahieren - Interagieren P.S. Gib dies deinem bevorzugten Agenten und sage ihm, er soll diese Tools zur Nutzung mit lokalen Modellen einrichten. Ahmad (@TheAhmadOsman) Nutzt du lokale LLMs? Stelle sicher, dass du die Websuche für sie einrichtest. Sag deinem bevorzugten Agenten, er soll SearNg für dich einrichten. Gib das deinen lokalen LLMs (sag einem Agenten, dass er das ebenfalls einrichten soll). Beobachte, wie sie viel intelligenter und effizienter werden. Bitte sehr — nitter.net/TheAhmadOsman/statu

    mehr auf Arint.info

    #AIAgents #LLM #LocalAI #TechStack #WebScraping #arint_info

    https://x.com/TheAhmadOsman/status/2044142893242204550#m

  22. New week, new update for the slides of my talk "Run LLMs Locally":

    Now including Gemma4 and Qwen3-Omni with Vision and Audio support and new slides describing Llama.cpp server parameters.

    codeberg.org/thbley/talks/raw/

    #ai #llm #llamacpp #stablediffusion #gptoss #qwen3 #glm #localai #gemma4

  23. OwnAether Personal AI Operating Systems- What if your entire digital life — your work, your income, your creativity, your health, your automation, your business — was orchestrated by a single intelligent layer that learns you, works for you, and evolves with you?

    medium.com/@ownaether/the-pers

    #AI #PersonalAI #IndividualAI #MyAI #YourAI #LocalAI #DesktopAI #AIApps #PrivateAI #LLMs #SLMs #AIModels #PersonalAIAssistant #PersonalAIApp

  24. Es ist wieder soweit - der Digital Independence Day. Jeder erste Sonntag im Monat wird wieder ein Wechsel vorgenommen. Dieses mal etwas ganz großes und wichtiges: Alexa und Google Assist fliegen raus und werden durch Homeassistant deren Voice-PE und lokale KI ersetzt. Was in meinem Haus passiert gehört mir. #digitalindependenceday #homeassistant #voicepe #localai #europe #digitalindependence #independence #digitalesouveranitat #digitaleunabhängigkeit

  25. We've been building ENGIOS because hardware deserves to live longer and software should respect the person running it.

    Today — AIDA.

    An intelligent OS deserves an intelligent heart.
    Your machine deserves a kind one.

    AIDA is the intelligence layer woven into ENGIOS. Actual local inference via Ollama — Phi-3 Mini. No internet required. Nothing leaving the machine. Ever.

    engios.dev · github.com/ENGIOS-DEV/ENGIOS

    #ENGIOS #AIDA #FOSS #LocalAI #Privacy #Linux #OpenSource

  26. Anyone using some sort of Microsoft recall or Recall ai local alternative?|
    Any reccs

    #localllm #localai @recall #microsoftrecall #techhelp #techhelpneeded

  27. Part 2 of my Local AI Lab For Developers series is live: “Tokens Are the Unit of Pain: Tokenization You Can See.”

    Tokenization is where context limits, latency, and cost become real constraints. This post is about making tokenization observable so prompt work stops being guesswork.

    methodicalfunction.com/log/202

    #LocalAI #LLM #Tokenization #DevTools

  28. Problem: we keep using frontier LLMs as glue for jobs that are already solved.

    Solution: run OCR + NER locally in C# with ONNX Runtime. Deterministic extraction on ingest. Store the entities. Use an LLM later only if you actually need synthesis.

    OCR with Tesseract, then BERT NER via ONNX in .NET. No Python, no cloud, no tokens.

    This is my 'for beginners' article. I'm DEEP in OCR but realised I never explained the quickest way to do this *locally*.

    mostlylucid.net/blog/simple-oc

    #CSharp #DotNet #ONNX #OnnxRuntime #OCR #NER #LocalAI #RAG #DocumentAI

  29. Trải nghiệm lập trình local với Devstral Small 2 (24B) trên card RTX 5060 Ti 16GB cực mượt!

    🔹 Setup: Devstral-Small-2 (Q4_K_M), context 24k, chạy hoàn toàn trên 16GB VRAM.
    🔹 Tốc độ: Xử lý prompt ~650 tok/s, tạo token 9-11 tok/s.
    🔹 Kết hợp: Dùng với Zed Agent mang lại hiệu quả cao hơn Claude Code nhờ hệ thống prompt ngắn gọn.
    🔹 Chất lượng: Xử lý tốt các tác vụ code phức tạp, tự động đọc file, chạy lệnh test và sửa lỗi khi có hướng dẫn chi tiết.

    #AI #Coding #Devstral #LLM #LocalAI #Programmi