#multimodalai — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #multimodalai, aggregated by home.social.
-
https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants
-
https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants
-
https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants
-
https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants
-
https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants
-
NVIDIA Nemotron 3 Nano Omni: Open Multimodal AI Agent Guide 2026
NVIDIA released Nemotron 3 Nano Omni on April 28, 2026 — the first open model to natively unify vision, audio, and language in a shared reasoning loop, delivering 9x highe...
https://wowhow.cloud/blogs/nvidia-nemotron-3-nano-omni-multimodal-agent-developer-guide-2026
-
https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI
-
https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI
-
https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI
-
https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI
-
https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI
-
https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI
-
https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI
-
https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI
-
https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI
-
https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI
-
Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
-
Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
-
Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
-
Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
-
Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
-
Multimodal AI without provenance is a deepfake factory. The 2026 fix is per-frame signing, voice gating, and a consent envelope around every output.
https://mickai.co.uk/articles/multimodal-ai-needs-provenance-or-its-a-deepfake-factory
-
NVIDIA Unveils "Nemotron 3 Nano Omni," Merging Vision, Audio, and Language for AI Agents
NVIDIA's Nemotron 3 Nano Omni is a new AI model that combines vision, audio, and language. It helps AI agents work faster and understand more.
#NvidiaAI, #Nemotron3, #MultimodalAI, #OpenSourceAI, #AIAgents
https://newsletter.tf/nvidia-nemotron-3-nano-omni-ai-model-vision-audio/
-
NVIDIA's new Nemotron 3 Nano Omni model can now understand images, sounds, and text all at once. This is a big step for AI agents.
#NvidiaAI, #Nemotron3, #MultimodalAI, #OpenSourceAI, #AIAgents
https://newsletter.tf/nvidia-nemotron-3-nano-omni-ai-model-vision-audio/ -
Xiaomi MiMo-V2.5: A New Era for Open-Weight Multimodal AI https://aiorbit.app/xiaomi-mimo-v2-5-a-new-era-for-open-weight-multimodal-ai/ #OpenWeightAI
#MiMoV2.5
#XiaomiAI
#MultimodalAI -
At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs
-
At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs
-
At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs
-
At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs
-
At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs
-
https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos
-
https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos
-
https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos
-
https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos
-
https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos
-
Each time it guesses wrong, it goes back and tweaks how much attention each decision-maker pays to each detail. Do that millions of times and suddenly you've got a system that can identify faces, translate languages, or generate an image from a sentence.
-
Each time it guesses wrong, it goes back and tweaks how much attention each decision-maker pays to each detail. Do that millions of times and suddenly you've got a system that can identify faces, translate languages, or generate an image from a sentence.
-
https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools
-
https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools
-
https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools
-
https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools
-
https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools
-
https://www.europesays.com/uk/875118/ AI Body Gap: Why Robots Need “Internal Feelings” to be Safe #AI #AISafety #HumanAIAlignment #InternalEmbodiment #interoception #MultimodalAi #neurobiology #Neuroscience #Neurotech #Robotics #Science #UCLA #UK #UnitedKingdom
-
https://www.europesays.com/ie/421141/ AI Body Gap: Why Robots Need “Internal Feelings” to be Safe #AI #AISafety #ArtificialIntelligence #ArtificialIntelligence #Éire #HumanAIAlignment #IE #InternalEmbodiment #interoception #Ireland #MultimodalAI #neurobiology #Neuroscience #neurotech #Robotics #Technology #UCLA
-
Google's Gemma Models: Open Framework or Elaborate Facade?
Google's Gemma 3 models released in May 2025 can now use both images and text. Find out how developers can use these new features.
#GoogleGemma, #AIModels, #Gemma3, #MultimodalAI, #DeveloperTools
https://newsletter.tf/google-gemma-3-models-support-images-text/
-
Google's Gemma Models: Open Framework or Elaborate Facade?
Google's Gemma 3 models released in May 2025 can now use both images and text. Find out how developers can use these new features.
#GoogleGemma, #AIModels, #Gemma3, #MultimodalAI, #DeveloperTools
https://newsletter.tf/google-gemma-3-models-support-images-text/
-
Google's new Gemma 3 models, released in May 2025, can now understand both text and images, a big step up from older versions.
#GoogleGemma, #AIModels, #Gemma3, #MultimodalAI, #DeveloperTools
https://newsletter.tf/google-gemma-3-models-support-images-text/ -
Google's new Gemma 3 models, released in May 2025, can now understand both text and images, a big step up from older versions.
#GoogleGemma, #AIModels, #Gemma3, #MultimodalAI, #DeveloperTools
https://newsletter.tf/google-gemma-3-models-support-images-text/ -
https://winbuzzer.com/2026/04/02/zai-launches-glm-5v-turbo-multimodal-vision-model-xcxwbn/
Z.ai Launches GLM-5V-Turbo Multimodal Vision Model
#AI #ZAI #Zhipu #GLM5VTurbo #GLM5VTurbo #ChinaAI #China #LLMs #MultimodalAI #AgenticAI #AIModels #ComputerVision #Glm5 #Openclaw #VisionCodingModel
-
https://winbuzzer.com/2026/04/02/zai-launches-glm-5v-turbo-multimodal-vision-model-xcxwbn/
Z.ai Launches GLM-5V-Turbo Multimodal Vision Model
#AI #ZAI #Zhipu #GLM5VTurbo #GLM5VTurbo #ChinaAI #China #LLMs #MultimodalAI #AgenticAI #AIModels #ComputerVision #Glm5 #Openclaw #VisionCodingModel