#mathvista — Public Fediverse posts on home.social

michabbb @[email protected] · 2024-11-07 · 22:40 UTC

🔍 Major breakthrough in multimodal AI research:

#InfinityMM dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions & 3M #AI generated data

🧠 Technical highlights:

New #AquilaVL2B model uses #LLaVA architecture with #Qwen25 language model & #SigLIP for image processing
Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks
Exceptional performance: #MMStar (54.9%), #MathVista (59%), #MMBench (75.2%)

🚀 Training innovation:

4-stage training process with increasing complexity
Combines image recognition, instruction classification & response generation
Uses #opensource models like RAM++ for data generation

💡 Industry impact:

Model trained on both #Nvidia A100 GPUs & Chinese chips
Complete dataset & model available to research community
Shows promising results compared to commercial systems like #GPT4V

https://arxiv.org/abs/2410.18558

#infinitymm #ai #aquilavl2b #llava #qwen25 #siglip

michabbb @[email protected] · 2024-11-07 · 22:40 UTC

🔍 Major breakthrough in multimodal AI research:

#InfinityMM dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions & 3M #AI generated data

🧠 Technical highlights:

New #AquilaVL2B model uses #LLaVA architecture with #Qwen25 language model & #SigLIP for image processing
Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks
Exceptional performance: #MMStar (54.9%), #MathVista (59%), #MMBench (75.2%)

🚀 Training innovation:

4-stage training process with increasing complexity
Combines image recognition, instruction classification & response generation
Uses #opensource models like RAM++ for data generation

💡 Industry impact:

Model trained on both #Nvidia A100 GPUs & Chinese chips
Complete dataset & model available to research community
Shows promising results compared to commercial systems like #GPT4V

https://arxiv.org/abs/2410.18558

#infinitymm #ai #aquilavl2b #llava #qwen25 #siglip

michabbb @[email protected] · 2024-11-07 · 22:40 UTC

🔍 Major breakthrough in multimodal AI research:

#InfinityMM dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions & 3M #AI generated data

🧠 Technical highlights:

New #AquilaVL2B model uses #LLaVA architecture with #Qwen25 language model & #SigLIP for image processing
Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks
Exceptional performance: #MMStar (54.9%), #MathVista (59%), #MMBench (75.2%)

🚀 Training innovation:

4-stage training process with increasing complexity
Combines image recognition, instruction classification & response generation
Uses #opensource models like RAM++ for data generation

💡 Industry impact:

Model trained on both #Nvidia A100 GPUs & Chinese chips
Complete dataset & model available to research community
Shows promising results compared to commercial systems like #GPT4V

https://arxiv.org/abs/2410.18558

#infinitymm #ai #aquilavl2b #llava #qwen25 #siglip

michabbb @[email protected] · 2024-11-07 · 22:40 UTC

🔍 Major breakthrough in multimodal AI research:

#InfinityMM dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions & 3M #AI generated data

🧠 Technical highlights:

New #AquilaVL2B model uses #LLaVA architecture with #Qwen25 language model & #SigLIP for image processing
Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks
Exceptional performance: #MMStar (54.9%), #MathVista (59%), #MMBench (75.2%)

🚀 Training innovation:

4-stage training process with increasing complexity
Combines image recognition, instruction classification & response generation
Uses #opensource models like RAM++ for data generation

💡 Industry impact:

Model trained on both #Nvidia A100 GPUs & Chinese chips
Complete dataset & model available to research community
Shows promising results compared to commercial systems like #GPT4V

https://arxiv.org/abs/2410.18558

#gpt4v #nvidia #opensource #mmbench #mathvista #mmstar

michabbb @[email protected] · 2024-11-07 · 22:40 UTC

🔍 Major breakthrough in multimodal AI research:

#InfinityMM dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions & 3M #AI generated data

🧠 Technical highlights:

New #AquilaVL2B model uses #LLaVA architecture with #Qwen25 language model & #SigLIP for image processing
Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks
Exceptional performance: #MMStar (54.9%), #MathVista (59%), #MMBench (75.2%)

🚀 Training innovation:

4-stage training process with increasing complexity
Combines image recognition, instruction classification & response generation
Uses #opensource models like RAM++ for data generation

💡 Industry impact:

Model trained on both #Nvidia A100 GPUs & Chinese chips
Complete dataset & model available to research community
Shows promising results compared to commercial systems like #GPT4V

https://arxiv.org/abs/2410.18558

#infinitymm #ai #aquilavl2b #llava #qwen25 #siglip

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#technews #qwen #visionlanguage #llm #ai #visualunderstanding

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#technews #qwen #visionlanguage #llm #ai #visualunderstanding

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#contentmoderation #robotics #healthcare #education #artificialintelligence #multilingualai

michabbb @[email protected] · 2024-09-01 · 14:44 UTC

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

#technews #qwen #visionlanguage #llm #ai #visualunderstanding