#visionlanguage — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #visionlanguage, aggregated by home.social.
-
New benchmark reveals that top multimodal models still stumble below 50% accuracy on basic visual entity tasks. The gap highlights limits in current vision‑language training and raises questions about real‑world reliability. Dive into the findings and what they mean for future AI research. #MultimodalLearning #VisionLanguage #EntityRecognition #AIBenchmarking
🔗 https://aidailypost.com/news/top-multimodal-models-fail-exceed-50-accuracy-basic-visual-entity
-
New benchmark reveals that top multimodal models still stumble below 50% accuracy on basic visual entity tasks. The gap highlights limits in current vision‑language training and raises questions about real‑world reliability. Dive into the findings and what they mean for future AI research. #MultimodalLearning #VisionLanguage #EntityRecognition #AIBenchmarking
🔗 https://aidailypost.com/news/top-multimodal-models-fail-exceed-50-accuracy-basic-visual-entity
-
New benchmark reveals that top multimodal models still stumble below 50% accuracy on basic visual entity tasks. The gap highlights limits in current vision‑language training and raises questions about real‑world reliability. Dive into the findings and what they mean for future AI research. #MultimodalLearning #VisionLanguage #EntityRecognition #AIBenchmarking
🔗 https://aidailypost.com/news/top-multimodal-models-fail-exceed-50-accuracy-basic-visual-entity
-
Nvidia's new Cosmos Reason 2 platform lets robots reason across vision‑language inputs, turning on‑board agents into true problem‑solvers for complex tasks—from warehouse sorting to autonomous vehicle navigation. The open‑source‑friendly stack promises faster deployment and richer data use. Curious how this could reshape AI‑driven robotics? Read on. #Nvidia #CosmosReason2 #Robotics #VisionLanguage
🔗 https://aidailypost.com/news/nvidias-cosmos-reason-2-boosts-robot-reasoning-complex-tasks
-
#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️
After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀
Key Features of Qwen2-VL:
1. 🖼️ Image Understanding:
Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.
2. 🎬 Video Analysis:
Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis
3. 🤖 Device Integration:
The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱
4. 🌍 Multilingual Capabilities:
Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI
This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.
-
And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)
See you in Vienna! #ACL2025 !
(4/4)
#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc
-
Florence-2: a vision foundation model that excels in a variety of computer vision and vision-language tasks through a unified, prompt-based approach. Unlike existing models, Florence-2 interprets text prompts to deliver results in tasks like captioning, object detection, grounding, and segmentation.
#AI #ComputerVision #MachineLearning #VisionLanguage
https://arxiv.org/abs/2311.06242 -
And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)
See you in Vienna! #ACL2025 !
(4/4)
#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc
-
And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)
See you in Vienna! #ACL2025 !
(4/4)
#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc
-
And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)
See you in Vienna! #ACL2025 !
(4/4)
#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc
-
And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)
See you in Vienna! #ACL2025 !
(4/4)
#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc
-
#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️
After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀
Key Features of Qwen2-VL:
1. 🖼️ Image Understanding:
Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.
2. 🎬 Video Analysis:
Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis
3. 🤖 Device Integration:
The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱
4. 🌍 Multilingual Capabilities:
Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI
This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.
-
#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️
After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀
Key Features of Qwen2-VL:
1. 🖼️ Image Understanding:
Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.
2. 🎬 Video Analysis:
Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis
3. 🤖 Device Integration:
The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱
4. 🌍 Multilingual Capabilities:
Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI
This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.
-
#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️
After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀
Key Features of Qwen2-VL:
1. 🖼️ Image Understanding:
Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.
2. 🎬 Video Analysis:
Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis
3. 🤖 Device Integration:
The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱
4. 🌍 Multilingual Capabilities:
Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI
This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.