#multimodal-ai — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #multimodal-ai, aggregated by home.social.
-
https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants
-
https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI
-
https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI
-
Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
-
At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs
-
https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos
-
https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools
-
https://winbuzzer.com/2026/04/02/zai-launches-glm-5v-turbo-multimodal-vision-model-xcxwbn/
Z.ai Launches GLM-5V-Turbo Multimodal Vision Model
#AI #ZAI #Zhipu #GLM5VTurbo #GLM5VTurbo #ChinaAI #China #LLMs #MultimodalAI #AgenticAI #AIModels #ComputerVision #Glm5 #Openclaw #VisionCodingModel
-
https://winbuzzer.com/2026/03/31/alibaba-qwen35-omni-closed-source-multimodal-ai-xcxwbn/
Alibaba Keeps Qwen3.5-Omni Closed, Breaks Open-Source Streak
#AI #AudioAI #Alibaba #Qwen35Omni #MultimodalAI #OpenSourceAI #Qwen #LLMs #ChinaAI #AlibabaCloud #SpeechSynthesis
-
FYI: Google Search Live goes global: 200+ countries now get voice and camera AI search: Google Search Live expands to all AI Mode markets on March 26, 2026, powered by Gemini 3.1 Flash Live, bringing multimodal voice and camera search to 200+ countries. https://ppc.land/google-search-live-goes-global-200-countries-now-get-voice-and-camera-ai-search/ #GoogleSearch #AI #VoiceSearch #CameraSearch #MultimodalAI
-
https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
-
🔍 Prof. Kementchedjhieva also discussed alternative approaches to improve vision-to-language alignment while maintaining strong language capabilities.
💬 We thank Prof. Kementchedjhieva for the insightful talk and the discussion with UKP members on multimodal modeling and the future of vision-language systems.
#UKPLab #MultimodalAI #VisionLanguageModels #NLP #GuestTalk #NLProc #MBZUAI #TUDa
-
Luma AI's Uni-1 Beats Google, OpenAI on Image Benchmarks
#AI #Uni1 #GenerativeAI #AIImageGeneration #LumaAI #TextToImage #MultimodalAI #AIImages #CreativeTools #ImageGeneration
-
https://winbuzzer.com/2026/03/12/gemini-embedding-2-unifies-text-images-video-in-one-model-xcxwbn/
Gemini Embedding 2 Unifies Text, Images, Video in One Model
#AI #Google #BigTech #GoogleGemini #EnterpriseAI #MultimodalAI #AISearch #AIAudio #AIVideo #AIImages #GoogleAI #GoogleDeepMind #GeminiEmbedding2
-
Der Datasatz „Pico-Banana-400K“ zeigt einen wichtigen Trend in der KI-Forschung: Der Fokus verschiebt sich von Bildgenerierung zu instruktionsbasierter Bildbearbeitung.
Modelle lernen nicht nur Bilder zu erzeugen, sondern gezielt zu verändern – ein Schritt Richtung visuell handelnder Systeme.
https://arxiv.org/abs/2510.19808
#AI #ComputerVision #MultimodalAI #Apple -
Black Forest Labs' new Self‑Flow framework cuts multimodal AI training time by 2.8× versus REPA, thanks to smarter feature alignment and better computational efficiency. Open‑source researchers can now train larger models faster. Dive into the details to see how this could reshape your ML pipelines. #SelfFlow #MultimodalAI #AITraining #ComputationalEfficiency
🔗 https://aidailypost.com/news/black-forest-labs-self-flow-speeds-multimodal-ai-training-28-faster
-
Microsoft's new Phi‑4 Reasoning Vision 15B packs multimodal reasoning into a compact 15‑billion‑parameter model, delivering low‑latency inference for vision‑language tasks. The paper shows how a tiny model can still reason across images and text, opening doors for open‑source AI on edge devices. Curious? Dive into the benchmarks and see the numbers. #Phi4 #LowLatencyAI #MultimodalAI #CompactModel
🔗 https://aidailypost.com/news/microsofts-phi-4-reasoning-vision-15b-offers-lowlatency-compact-ai
-
New AI methods let scientists merge RNA‑seq, imaging and other data, revealing hidden cellular states. This multimodal approach could accelerate discoveries in cell biology and computational biology. Learn how machine learning bridges data integration across experiments. #MultimodalAI #CellBiology #RNAseq #ComputationalBiology
🔗 https://aidailypost.com/news/ai-enables-scientists-integrate-multiple-cell-measurements
-
Gemini now lets you conjure music as easily as images or video. The latest upgrade adds Lyria 3, a multimodal AI that composes tracks on the fly, expanding creative possibilities for open‑source artists. Curious how DeepMind’s tools are reshaping generative expression? Read on. #GoogleGemini #MusicGeneration #MultimodalAI #GenerativeAI
🔗 https://aidailypost.com/news/gemini-app-expands-tools-now-generates-music-alongside-images-video
-
ByteDance rolls out Seedance 2.0, a leap in AI video generation that blends text, audio and motion. The upgrade powers richer multimodal content and has already sparked a rally in its stock. Curious how generative video is reshaping the market? Dive in. #Seedance2 #ByteDanceAI #GenerativeVideo #MultimodalAI
🔗 https://aidailypost.com/news/bytedances-seedance-20-boosts-ai-video-capabilities-fuels-stock-rally
-
ByteDance just unveiled Seedance 2.0, a multimodal AI that turns text, images, audio and video into ready‑to‑share clips. It’s the newest challenger to OpenAI’s Sora and Google’s Veo, pushing AI video generation and content creation forward. Curious how it works? Read on. #ByteDance #Seedance2 #MultimodalAI #VideoAI
🔗 https://aidailypost.com/news/bytedance-ai-model-creates-clips-from-text-images-audio-video
-
xAI’s co‑founder exits keep coming, while Lambda outlines a 2025 shift toward bigger context windows, multimodal reasoning models and open‑source inference for AI production. What could this mean for the future of machine learning? Read on for the full story. #AIProduction #ReasoningModels #MultimodalAI #OpenSourceInference
🔗 https://aidailypost.com/news/xai-co-founder-departures-persist-lambda-outlines-2025-ai-production
-
ByteDance just launched Seedance 2.0, a new AI video engine that can generate clips from text or images and even follow a reference video as a model. The multi‑modal upgrade promises richer, more controllable video creation for creators and researchers alike. Curious how the reference model works? Dive into the details. #Seedance2_0 #ByteDanceAI #TextToVideo #MultiModalAI
🔗 https://aidailypost.com/news/bytedance-unveils-seedance-20-ai-video-reference-capability
-
Function calling turned LLMs from chatbots into action systems—reshaping AI runtimes, security, reasoning models, and specialization. https://hackernoon.com/ai-in-2026-function-calling-reasoning-models-and-a-new-runtime-era #multimodalai
-
Apache Spark 4.1 introduces declarative pipelines, materialized views, and built-in data quality—reshaping how modern data systems are designed. https://hackernoon.com/youtu-vl-shows-how-treating-vision-as-a-target-unlocks-better-multimodal-ai #multimodalai
-
https://winbuzzer.com/2026/01/29/deepseek-targets-google-multimodal-ai-search-xcxwbn/
DeepSeek Targets Google with Multimodal AI Search
#AI #DeepSeek #Google #AISearch #AIAgents #SearchEngines #MultimodalAI #GoogleSearch