#visionlanguage — Public Fediverse posts on home.social

AI Daily Post @[email protected] · 2026-02-08 · 13:43 UTC

New benchmark reveals that top multimodal models still stumble below 50% accuracy on basic visual entity tasks. The gap highlights limits in current vision‑language training and raises questions about real‑world reliability. Dive into the findings and what they mean for future AI research. #MultimodalLearning #VisionLanguage #EntityRecognition #AIBenchmarking

🔗 https://aidailypost.com/news/top-multimodal-models-fail-exceed-50-accuracy-basic-visual-entity

#multimodallearning #visionlanguage #entityrecognition #aibenchmarking

AI Daily Post @[email protected] · 2026-02-08 · 13:43 UTC

New benchmark reveals that top multimodal models still stumble below 50% accuracy on basic visual entity tasks. The gap highlights limits in current vision‑language training and raises questions about real‑world reliability. Dive into the findings and what they mean for future AI research. #MultimodalLearning #VisionLanguage #EntityRecognition #AIBenchmarking

🔗 https://aidailypost.com/news/top-multimodal-models-fail-exceed-50-accuracy-basic-visual-entity

#aibenchmarking #entityrecognition #visionlanguage #multimodallearning

AI Daily Post @[email protected] · 2026-02-08 · 13:43 UTC

New benchmark reveals that top multimodal models still stumble below 50% accuracy on basic visual entity tasks. The gap highlights limits in current vision‑language training and raises questions about real‑world reliability. Dive into the findings and what they mean for future AI research. #MultimodalLearning #VisionLanguage #EntityRecognition #AIBenchmarking

🔗 https://aidailypost.com/news/top-multimodal-models-fail-exceed-50-accuracy-basic-visual-entity

#multimodallearning #visionlanguage #entityrecognition #aibenchmarking

AI Daily Post @[email protected] · 2026-01-05 · 23:13 UTC

Nvidia's new Cosmos Reason 2 platform lets robots reason across vision‑language inputs, turning on‑board agents into true problem‑solvers for complex tasks—from warehouse sorting to autonomous vehicle navigation. The open‑source‑friendly stack promises faster deployment and richer data use. Curious how this could reshape AI‑driven robotics? Read on. #Nvidia #CosmosReason2 #Robotics #VisionLanguage

🔗 https://aidailypost.com/news/nvidias-cosmos-reason-2-boosts-robot-reasoning-complex-tasks

#nvidia #cosmosreason2 #robotics #visionlanguage

UKP Lab @[email protected] · 2025-07-18 · 15:01 UTC

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

#acl2025 #mllm #aisafety #jailbreak #multimodal #coninstruction

UKP Lab @[email protected] · 2025-07-18 · 15:01 UTC

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

#acl2025 #mllm #aisafety #jailbreak #multimodal #coninstruction

UKP Lab @[email protected] · 2025-07-18 · 15:01 UTC

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

#acl2025 #mllm #aisafety #jailbreak #multimodal #coninstruction

UKP Lab @[email protected] · 2025-07-18 · 15:01 UTC

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

#audiolanguage #visionlanguage #llmredteaming #coninstruction #multimodal #jailbreak

UKP Lab @[email protected] · 2025-07-18 · 15:01 UTC

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

#acl2025 #mllm #aisafety #jailbreak #multimodal #coninstruction

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#technews #qwen #visionlanguage #llm #ai #visualunderstanding

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#technews #qwen #visionlanguage #llm #ai #visualunderstanding

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#contentmoderation #robotics #healthcare #education #artificialintelligence #multilingualai

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#technews #qwen #visionlanguage #llm #ai #visualunderstanding

Harald Klinke @[email protected] · 2024-06-20 · 05:41 UTC

Florence-2: a vision foundation model that excels in a variety of computer vision and vision-language tasks through a unified, prompt-based approach. Unlike existing models, Florence-2 interprets text prompts to deliver results in tasks like captioning, object detection, grounding, and segmentation.
#AI #ComputerVision #MachineLearning #VisionLanguage
https://arxiv.org/abs/2311.06242

#ai #computervision #machinelearning #visionlanguage