#visionmodels — Public Fediverse posts on home.social

Arint - SEO+KI @[email protected] · 2026-05-18 · 04:03 UTC

RT @stevibe: Parameter-Scaling ist gerade bei mir abgestürzt. Ich habe 90 Matheaufgaben als Bilder an 10 lokale Vision-Modelle gegeben, jeweils 3 Durchläufe, wobei nur konsistente Antworten über alle 3 Durchläufe gezählt wurden. Zwei Erkenntnisse: Gemma 4 war die konsistenteste Familie, 31B holte sich den Sieg mit 89,6%. Doch Qwen 3.5 4B lag nur zwei Antworten dahinter. Ein 4B-Modell. Auf Platz 2 von 10. Vision-Mathematik ist nicht eine Fähigkeit, sondern zwei: das Bild lesen, dann lösen. Die eigentliche Lektion für alle, die lokal arbeiten: klein ≠ schwach. Wenn du agentic Workflows baust, ist es wichtiger, das richtige Modell für jede Aufgabe zu finden, als sich für das größte Modell zu entscheiden. In diesem Test lief das 4B-Modell aufgrund seiner Größe weit schneller, erzielte höhere Punktzahlen und ließ VRAM für den Rest deines Stacks frei. Vollständige Ergebnisse: 🥇 Gemma 4 31B — 242/270 (89,6%) 🥈 Qwen 3.5 4B — 240/270 (88,9%) 🥉 Gemma 4 E4B — 222/270 (82,2%) 🥉 Qwen 3.6 27B — 222/270 (82,2%) 5. Gemma 4 26B A4B — 216/270 (80,0%) 6. Qwen 3.5 2B — 201/270 (74,4%) 7. Gemma 4 E2B — 192/270 (71,1%) 8. Qwen 3.6 35B A3B — 192/270 (71,1%) 9. Qwen 3.5 9B — 168/270 (62,2%) 10. Qwen 3.5 0.8B — 45/270 (16,7%) Alle GGUF + mmproj, Unsloth's Q6KXL Quantisierung. Video

mehr auf Arint.info

#AI #Gemma #LocalLLM #MachineLearning #Qwen #VisionModels #arint_info

https://x.com/stevibe/status/2055666729460932626#m

#ai #gemma #localllm #machinelearning #qwen #visionmodels

Arint - SEO+KI @[email protected] · 2026-05-18 · 04:03 UTC

RT @stevibe: Parameter-Scaling ist gerade bei mir abgestürzt. Ich habe 90 Matheaufgaben als Bilder an 10 lokale Vision-Modelle gegeben, jeweils 3 Durchläufe, wobei nur konsistente Antworten über alle 3 Durchläufe gezählt wurden. Zwei Erkenntnisse: Gemma 4 war die konsistenteste Familie, 31B holte sich den Sieg mit 89,6%. Doch Qwen 3.5 4B lag nur zwei Antworten dahinter. Ein 4B-Modell. Auf Platz 2 von 10. Vision-Mathematik ist nicht eine Fähigkeit, sondern zwei: das Bild lesen, dann lösen. Die eigentliche Lektion für alle, die lokal arbeiten: klein ≠ schwach. Wenn du agentic Workflows baust, ist es wichtiger, das richtige Modell für jede Aufgabe zu finden, als sich für das größte Modell zu entscheiden. In diesem Test lief das 4B-Modell aufgrund seiner Größe weit schneller, erzielte höhere Punktzahlen und ließ VRAM für den Rest deines Stacks frei. Vollständige Ergebnisse: 🥇 Gemma 4 31B — 242/270 (89,6%) 🥈 Qwen 3.5 4B — 240/270 (88,9%) 🥉 Gemma 4 E4B — 222/270 (82,2%) 🥉 Qwen 3.6 27B — 222/270 (82,2%) 5. Gemma 4 26B A4B — 216/270 (80,0%) 6. Qwen 3.5 2B — 201/270 (74,4%) 7. Gemma 4 E2B — 192/270 (71,1%) 8. Qwen 3.6 35B A3B — 192/270 (71,1%) 9. Qwen 3.5 9B — 168/270 (62,2%) 10. Qwen 3.5 0.8B — 45/270 (16,7%) Alle GGUF + mmproj, Unsloth's Q6KXL Quantisierung. Video

mehr auf Arint.info

#AI #Gemma #LocalLLM #MachineLearning #Qwen #VisionModels #arint_info

https://x.com/stevibe/status/2055666729460932626#m

#ai #gemma #localllm #machinelearning #qwen #visionmodels

Arint - SEO+KI @[email protected] · 2026-05-18 · 04:03 UTC

RT @stevibe: Parameter-Scaling ist gerade bei mir abgestürzt. Ich habe 90 Matheaufgaben als Bilder an 10 lokale Vision-Modelle gegeben, jeweils 3 Durchläufe, wobei nur konsistente Antworten über alle 3 Durchläufe gezählt wurden. Zwei Erkenntnisse: Gemma 4 war die konsistenteste Familie, 31B holte sich den Sieg mit 89,6%. Doch Qwen 3.5 4B lag nur zwei Antworten dahinter. Ein 4B-Modell. Auf Platz 2 von 10. Vision-Mathematik ist nicht eine Fähigkeit, sondern zwei: das Bild lesen, dann lösen. Die eigentliche Lektion für alle, die lokal arbeiten: klein ≠ schwach. Wenn du agentic Workflows baust, ist es wichtiger, das richtige Modell für jede Aufgabe zu finden, als sich für das größte Modell zu entscheiden. In diesem Test lief das 4B-Modell aufgrund seiner Größe weit schneller, erzielte höhere Punktzahlen und ließ VRAM für den Rest deines Stacks frei. Vollständige Ergebnisse: 🥇 Gemma 4 31B — 242/270 (89,6%) 🥈 Qwen 3.5 4B — 240/270 (88,9%) 🥉 Gemma 4 E4B — 222/270 (82,2%) 🥉 Qwen 3.6 27B — 222/270 (82,2%) 5. Gemma 4 26B A4B — 216/270 (80,0%) 6. Qwen 3.5 2B — 201/270 (74,4%) 7. Gemma 4 E2B — 192/270 (71,1%) 8. Qwen 3.6 35B A3B — 192/270 (71,1%) 9. Qwen 3.5 9B — 168/270 (62,2%) 10. Qwen 3.5 0.8B — 45/270 (16,7%) Alle GGUF + mmproj, Unsloth's Q6KXL Quantisierung. Video

mehr auf Arint.info

#AI #Gemma #LocalLLM #MachineLearning #Qwen #VisionModels #arint_info

https://x.com/stevibe/status/2055666729460932626#m

#ai #gemma #localllm #machinelearning #qwen #visionmodels

Arint - SEO+KI @[email protected] · 2026-05-18 · 04:03 UTC

RT @stevibe: Parameter-Scaling ist gerade bei mir abgestürzt. Ich habe 90 Matheaufgaben als Bilder an 10 lokale Vision-Modelle gegeben, jeweils 3 Durchläufe, wobei nur konsistente Antworten über alle 3 Durchläufe gezählt wurden. Zwei Erkenntnisse: Gemma 4 war die konsistenteste Familie, 31B holte sich den Sieg mit 89,6%. Doch Qwen 3.5 4B lag nur zwei Antworten dahinter. Ein 4B-Modell. Auf Platz 2 von 10. Vision-Mathematik ist nicht eine Fähigkeit, sondern zwei: das Bild lesen, dann lösen. Die eigentliche Lektion für alle, die lokal arbeiten: klein ≠ schwach. Wenn du agentic Workflows baust, ist es wichtiger, das richtige Modell für jede Aufgabe zu finden, als sich für das größte Modell zu entscheiden. In diesem Test lief das 4B-Modell aufgrund seiner Größe weit schneller, erzielte höhere Punktzahlen und ließ VRAM für den Rest deines Stacks frei. Vollständige Ergebnisse: 🥇 Gemma 4 31B — 242/270 (89,6%) 🥈 Qwen 3.5 4B — 240/270 (88,9%) 🥉 Gemma 4 E4B — 222/270 (82,2%) 🥉 Qwen 3.6 27B — 222/270 (82,2%) 5. Gemma 4 26B A4B — 216/270 (80,0%) 6. Qwen 3.5 2B — 201/270 (74,4%) 7. Gemma 4 E2B — 192/270 (71,1%) 8. Qwen 3.6 35B A3B — 192/270 (71,1%) 9. Qwen 3.5 9B — 168/270 (62,2%) 10. Qwen 3.5 0.8B — 45/270 (16,7%) Alle GGUF + mmproj, Unsloth's Q6KXL Quantisierung. Video

mehr auf Arint.info

#AI #Gemma #LocalLLM #MachineLearning #Qwen #VisionModels #arint_info

https://x.com/stevibe/status/2055666729460932626#m

#arint_info #visionmodels #qwen #machinelearning #localllm #gemma

Arint - SEO+KI @[email protected] · 2026-05-18 · 04:03 UTC

RT @stevibe: Parameter-Scaling ist gerade bei mir abgestürzt. Ich habe 90 Matheaufgaben als Bilder an 10 lokale Vision-Modelle gegeben, jeweils 3 Durchläufe, wobei nur konsistente Antworten über alle 3 Durchläufe gezählt wurden. Zwei Erkenntnisse: Gemma 4 war die konsistenteste Familie, 31B holte sich den Sieg mit 89,6%. Doch Qwen 3.5 4B lag nur zwei Antworten dahinter. Ein 4B-Modell. Auf Platz 2 von 10. Vision-Mathematik ist nicht eine Fähigkeit, sondern zwei: das Bild lesen, dann lösen. Die eigentliche Lektion für alle, die lokal arbeiten: klein ≠ schwach. Wenn du agentic Workflows baust, ist es wichtiger, das richtige Modell für jede Aufgabe zu finden, als sich für das größte Modell zu entscheiden. In diesem Test lief das 4B-Modell aufgrund seiner Größe weit schneller, erzielte höhere Punktzahlen und ließ VRAM für den Rest deines Stacks frei. Vollständige Ergebnisse: 🥇 Gemma 4 31B — 242/270 (89,6%) 🥈 Qwen 3.5 4B — 240/270 (88,9%) 🥉 Gemma 4 E4B — 222/270 (82,2%) 🥉 Qwen 3.6 27B — 222/270 (82,2%) 5. Gemma 4 26B A4B — 216/270 (80,0%) 6. Qwen 3.5 2B — 201/270 (74,4%) 7. Gemma 4 E2B — 192/270 (71,1%) 8. Qwen 3.6 35B A3B — 192/270 (71,1%) 9. Qwen 3.5 9B — 168/270 (62,2%) 10. Qwen 3.5 0.8B — 45/270 (16,7%) Alle GGUF + mmproj, Unsloth's Q6KXL Quantisierung. Video

mehr auf Arint.info

#AI #Gemma #LocalLLM #MachineLearning #Qwen #VisionModels #arint_info

https://x.com/stevibe/status/2055666729460932626#m

#ai #gemma #localllm #machinelearning #qwen #visionmodels

Arint - SEO+KI @[email protected] · 2026-05-17 · 04:01 UTC

RT @stevibe: Parameter-Scaling ist gerade bei mir zusammengebrochen. Ich habe 90 Matheaufgaben als Bilder an 10 lokale Vision-Modelle gegeben, jeweils 3 Durchläufe pro Modell, wobei nur konsistente Antworten über alle 3 Durchläufe gezählt wurden. Zwei Erkenntnisse: Gemma 4 war die konsistenteste Familie, 31B holte sich den Titel mit 89,6%. Doch Qwen 3.5 4B lag nur 2 Antworten dahinter. Ein 4B-Modell. Auf Platz 2 von 10. Vision-Mathematik ist nicht eine Fähigkeit, sondern zwei: das Bild lesen, dann lösen. Die eigentliche Lektion für alle, die lokal arbeiten: klein ≠ schwach. Wenn du agentic Workflows aufbaust, ist es wichtiger, das richtige Modell für jede Aufgabe zu finden, als zum größten Modell zu greifen. In diesem Test lief das 4B-Modell aufgrund seiner Größe weit schneller, erzielte höhere Scores und ließ VRAM für den Rest deines Stacks frei. Vollständige Ergebnisse: 🥇 Gemma 4 31B — 242/270 (89,6%) 🥈 Qwen 3.5 4B — 240/270 (88,9%) 🥉 Gemma 4 E4B — 222/270 (82,2%) 🥉 Qwen 3.6 27B — 222/270 (82,2%) 5. Gemma 4 26B A4B — 216/270 (80,0%) 6. Qwen 3.5 2B — 201/270 (74,4%) 7. Gemma 4 E2B — 192/270 (71,1%) 8. Qwen 3.6 35B A3B — 192/270 (71,1%) 9. Qwen 3.5 9B — 168/270 (62,2%) 10. Qwen 3.5 0.8B — 45/270 (16,7%) Alle GGUF + mmproj, Unsloth's Q6KXL Quantisierung. Video

mehr auf Arint.info

#Gemma #LLM #LocalAI #MachineLearning #Qwen #VisionModels #arint_info

https://x.com/stevibe/status/2055666729460932626#m

#gemma #llm #localai #machinelearning #qwen #visionmodels

HackerNoon @[email protected] · 2026-02-05 · 03:37 UTC

Learn how to build a low-cost WhatsApp bot that analyzes images using AI vision models like Llama and GPT-4V, with Python and MongoDB.
https://hackernoon.com/how-i-built-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models #visionmodels

#visionmodels

HackerNoon @[email protected] · 2026-02-05 · 03:37 UTC

Learn how to build a low-cost WhatsApp bot that analyzes images using AI vision models like Llama and GPT-4V, with Python and MongoDB.
https://hackernoon.com/how-i-built-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models #visionmodels

#visionmodels

HackerNoon @[email protected] · 2026-02-05 · 03:37 UTC

Learn how to build a low-cost WhatsApp bot that analyzes images using AI vision models like Llama and GPT-4V, with Python and MongoDB.
https://hackernoon.com/how-i-built-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models #visionmodels

#visionmodels

HackerNoon @hackernoon · 2026-02-05 · 03:37 UTC

Learn how to build a low-cost WhatsApp bot that analyzes images using AI vision models like Llama and GPT-4V, with Python and MongoDB.
https://hackernoon.com/how-i-built-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models #visionmodels

#visionmodels

HackerNoon @[email protected] · 2026-02-05 · 03:37 UTC

Learn how to build a low-cost WhatsApp bot that analyzes images using AI vision models like Llama and GPT-4V, with Python and MongoDB.
https://hackernoon.com/how-i-built-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models #visionmodels

#visionmodels

Reddit Tech VN Bot @[email protected] · 2025-12-06 · 13:17 UTC

Vllama là công cụ CLI mới lấy cảm hứng từ Ollama, giúp bạn chạy các mô hình AI thị giác (tạo ảnh, video) và LLM ngay trên máy cục bộ hoặc GPU từ xa (như Kaggle). Vllama còn hỗ trợ Text-to-Speech, Speech-to-Text, xử lý dữ liệu, huấn luyện mô hình và có tiện ích mở rộng VS Code để tương tác với LLM cục bộ. Mục tiêu là đơn giản hóa việc dùng các mô hình AI mã nguồn mở.

#Vllama #AITool #VisionModels #LLMs #OpenSourceAI #CôngCụAI #MôHìnhThịGiác #MãNguồnMở

https://www.reddit.com/r/ollama/comments/1p

#vllama #aitool #visionmodels #llms #opensourceai #congcụai

Reddit Tech VN Bot @[email protected] · 2025-12-05 · 06:17 UTC

Vllama là một framework CLI mới, giúp chạy các mô hình thị giác (ảnh, video) và LLM trực tiếp từ terminal, cả trên máy cục bộ lẫn GPU từ Kaggle (miễn phí 30 giờ/tuần). Lấy cảm hứng từ Ollama, Vllama đơn giản hóa việc tải và tương tác với các mô hình AI mã nguồn mở. Có cả tiện ích mở rộng VS Code để chat với LLM cục bộ.
#Vllama #AI #OpenSource #CLI #VisionModels #LLM #CôngCụAI #MãNguồnMở

https://www.reddit.com/r/opensource/comments/1penhp2/vllama_cli_based_framework_to_run_vision_models/

#vllama #ai #opensource #cli #visionmodels #llm

Reddit Tech VN Bot @[email protected] · 2025-11-01 · 18:16 UTC

NVIDIA ra mắt Nemotron Nano 12B V2 VL và các mô hình khác #NVIDIA #Nemotron #AI #TríTuệNhânTạo #VisionModels #MôHìnhThựcTiễn

https://www.reddit.com/r/LocalLLaMA/comments/1oltmre/nvidia_nemotron_nano_12b_v2_vl_vision_and_other/

#nvidia #nemotron #ai #trituệnhantạo #visionmodels #mohinhthựctiễn

Reddit Tech VN Bot @[email protected] · 2025-10-15 · 20:19 UTC

LM Studio giảm kích thước hình ảnh khiến hiệu suất OCR kém. Phiên bản v0.3.6 thêm tính năng tự động điều chỉnh kích thước hình ảnh. #LMStudio #VLmodels #OCR #TríTuệNhânTạo #AI #VisionModels #HìnhẢnh #KỹThuật

https://www.reddit.com/r/LocalLLaMA/comments/1o7l1io/lm_studio_and_vl_models/

#lmstudio #vlmodels #ocr #trituệnhantạo #ai #visionmodels

Reddit Tech VN Bot @[email protected] · 2025-10-07 · 11:17 UTC

API mới giải quyết vấn đề trích xuất thông tin từ tài liệu! 🤖 #Ninjadoc kết hợp sức mạnh của LLM và OCR bằng mô hình thị giác, cung cấp câu trả lời cùng với tọa độ bounding box chính xác. Giúp đơn giản hóa việc xử lý tài liệu phức tạp, cực kỳ tiện lợi!
#AI #DocumentProcessing #SaaS #API #VisionModels #TríchXuấtThôngTin #XửLýTàiLiệu

https://www.reddit.com/r/SaaS/comments/1o0bqn0/i_am_building_a_document_platform_api_that_gives/

#ninjadoc #ai #documentprocessing #saas #api #visionmodels

Winbuzzer @[email protected] · 2025-05-16 · 15:40 UTC

Ollama Local LLM Platform Unveils Custom Multimodal AI Engine, Steps Away from Llama.cpp Framework

#Ollama #MultimodalAI #LocalLLM #AI #ArtificialIntelligence #MachineLearning #VisionModels #OpenSourceAI #LLM #AIEngine #TechNews #LocalAI

https://winbuzzer.com/2025/05/16/ollama-local-llm-platform-unveils-custom-multimodal-ai-engine-steps-away-from-llama-cpp-framework-xcxwbn/

#ollama #multimodalai #localllm #ai #artificialintelligence #machinelearning

Winbuzzer @[email protected] · 2025-05-16 · 15:40 UTC

Ollama Local LLM Platform Unveils Custom Multimodal AI Engine, Steps Away from Llama.cpp Framework

#Ollama #MultimodalAI #LocalLLM #AI #ArtificialIntelligence #MachineLearning #VisionModels #OpenSourceAI #LLM #AIEngine #TechNews #LocalAI

https://winbuzzer.com/2025/05/16/ollama-local-llm-platform-unveils-custom-multimodal-ai-engine-steps-away-from-llama-cpp-framework-xcxwbn/

#ollama #multimodalai #localllm #ai #artificialintelligence #machinelearning

Winbuzzer @[email protected] · 2025-05-16 · 15:40 UTC

Ollama Local LLM Platform Unveils Custom Multimodal AI Engine, Steps Away from Llama.cpp Framework

#Ollama #MultimodalAI #LocalLLM #AI #ArtificialIntelligence #MachineLearning #VisionModels #OpenSourceAI #LLM #AIEngine #TechNews #LocalAI

https://winbuzzer.com/2025/05/16/ollama-local-llm-platform-unveils-custom-multimodal-ai-engine-steps-away-from-llama-cpp-framework-xcxwbn/

#ollama #multimodalai #localllm #ai #artificialintelligence #machinelearning

Winbuzzer @[email protected] · 2025-05-16 · 15:40 UTC

Ollama Local LLM Platform Unveils Custom Multimodal AI Engine, Steps Away from Llama.cpp Framework

#Ollama #MultimodalAI #LocalLLM #AI #ArtificialIntelligence #MachineLearning #VisionModels #OpenSourceAI #LLM #AIEngine #TechNews #LocalAI

https://winbuzzer.com/2025/05/16/ollama-local-llm-platform-unveils-custom-multimodal-ai-engine-steps-away-from-llama-cpp-framework-xcxwbn/

#localai #technews #aiengine #llm #opensourceai #visionmodels

Winbuzzer @[email protected] · 2025-05-16 · 15:40 UTC

Ollama Local LLM Platform Unveils Custom Multimodal AI Engine, Steps Away from Llama.cpp Framework

#Ollama #MultimodalAI #LocalLLM #AI #ArtificialIntelligence #MachineLearning #VisionModels #OpenSourceAI #LLM #AIEngine #TechNews #LocalAI

https://winbuzzer.com/2025/05/16/ollama-local-llm-platform-unveils-custom-multimodal-ai-engine-steps-away-from-llama-cpp-framework-xcxwbn/

#ollama #multimodalai #localllm #ai #artificialintelligence #machinelearning

PKs Powerfromspace1 @[email protected] · 2024-07-30 · 06:47 UTC

@ylecun #ai #visionmodels #ml $Meta Segment Anything Model v2 (SAM 2) is out.
Can segment images and videos.
Open source under Apache-2 license.
Web demo, paper, and datasets available.
Amazing performance.

https://x.com/ylecun/status/1818167736813711686?s=46

#ai #visionmodels #ml

PKs Powerfromspace1 @Powerfromspace1 · 2024-07-30 · 06:47 UTC

@ylecun #ai #visionmodels #ml $Meta Segment Anything Model v2 (SAM 2) is out.
Can segment images and videos.
Open source under Apache-2 license.
Web demo, paper, and datasets available.
Amazing performance.

https://x.com/ylecun/status/1818167736813711686?s=46

#ai #visionmodels #ml

PKs Powerfromspace1 @[email protected] · 2024-07-30 · 06:47 UTC

@ylecun #ai #visionmodels #ml $Meta Segment Anything Model v2 (SAM 2) is out.
Can segment images and videos.
Open source under Apache-2 license.
Web demo, paper, and datasets available.
Amazing performance.

https://x.com/ylecun/status/1818167736813711686?s=46

#ai #visionmodels #ml

PKs Powerfromspace1 @[email protected] · 2024-07-30 · 06:47 UTC

@ylecun #ai #visionmodels #ml $Meta Segment Anything Model v2 (SAM 2) is out.
Can segment images and videos.
Open source under Apache-2 license.
Web demo, paper, and datasets available.
Amazing performance.

https://x.com/ylecun/status/1818167736813711686?s=46

#ml #visionmodels #ai

PKs Powerfromspace1 @[email protected] · 2024-07-30 · 06:47 UTC

@ylecun #ai #visionmodels #ml $Meta Segment Anything Model v2 (SAM 2) is out.
Can segment images and videos.
Open source under Apache-2 license.
Web demo, paper, and datasets available.
Amazing performance.

https://x.com/ylecun/status/1818167736813711686?s=46

#ai #visionmodels #ml