Sign in Create account

#multimodal-ai — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #multimodal-ai, aggregated by home.social.

fetched live

Winbuzzer @[email protected] · 2026-05-13 · 10:38 UTC

https://winbuzzer.com/2026/05/13/thinking-machines-wants-to-build-an-ai-that-actual-xcxwbn/
Thinking Machines Lab has previewed a research-stage full-duplex AI system built to keep listening while it responds, rather than waiting for turn-based exchanges.
#AI #ThinkingMachinesLab #MiraMurati #VoiceAI #AIModels #ConversationalAI #MultimodalAI #VoiceAssistants

#ai #thinkingmachineslab #miramurati #voiceai #aimodels #conversationalai
Winbuzzer @[email protected] · 2026-05-11 · 13:25 UTC

https://winbuzzer.com/2026/05/11/gemini-api-file-search-is-now-multimodal-xcxwbn/
Google has expanded Gemini API File Search with multimodal retrieval, metadata filtering, and page citations.
#AI #GeminiAPI #Google #GoogleGemini #GoogleAI #AITools #AISearch #MultimodalAI

#ai #geminiapi #google #googlegemini #googleai #aitools
Winbuzzer @[email protected] · 2026-05-05 · 13:04 UTC

https://winbuzzer.com/2026/05/05/image-ai-models-now-drive-app-growth-beating-chatb-xcxwbn/
Image AI Launches Beat Chatbot Upgrades on App Growth
#AI #AIImageGeneration #AIModels #GenerativeAI #Chatbots #ChatGPT #OpenAI #GPT4o #Google #GoogleGemini #MetaAI #DeepSeekR1 #MultimodalAI

#ai #aiimagegeneration #aimodels #generativeai #chatbots #chatgpt
Andreas Becker @[email protected] · 2026-05-04 · 07:37 UTC

Meta AI veröffentlicht das multimodale Modell Tuna-2, das Bildinhalte ohne klassische Vision-Encoder verarbeitet.
Die Architektur liest rohe Pixel direkt über Patch-Embeddings ein und umgeht VAE-Module. Beim OCRBench zeigt Tuna-2 bessere Werte als vergleichbare Systeme. Das Training zwingt Transformer-Decoder durch das Verdecken von Bildbereichen zur eigenständigen Erkennung visueller Strukturen.
#MetaAI #Tuna2 #MultimodalAI #LLM #AIGeneratedImage
https://www.all-ai.de/news/news26/meta-tuna2-neu

#metaai #tuna2 #multimodalai #llm #aigeneratedimage
UKP Lab @[email protected] · 2026-04-29 · 06:58 UTC

At UKP, he will apply his expertise in 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to the 𝗱𝗼𝗺𝗮𝗶𝗻 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀, with a focus on aligning models with 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝘀 and better understanding 𝗺𝗼𝗱𝗲𝗹 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝗶𝗲𝘀.
Learn more about Kurt and his work: https://www.kurtmica.com/
Looking forward to having you on the team, Kurt! 👋
#UKPLab #TUDarmstadt #NLP #NLProc #MultimodalAI #LowResourceNLP #LLMs

#ukplab #tudarmstadt #nlp #nlproc #multimodalai #lowresourcenlp
Winbuzzer @[email protected] · 2026-04-18 · 15:18 UTC

https://winbuzzer.com/2026/04/18/google-gives-gemini-personalized-images-via-nano-banana-xcxwbn/
Google Gives Gemini Personalized Images via Nano Banana
#AI #Google #GoogleGemini #GoogleAI #GenAI #AIImageGeneration #AIImages #TextToImage #MultimodalAI #AIApplications #AIAssistants #BigTech #NanoBanana #GooglePhotos

#ai #google #googlegemini #googleai #genai #aiimagegeneration
Winbuzzer @[email protected] · 2026-04-14 · 10:54 UTC

https://winbuzzer.com/2026/04/14/minimax-launches-mmx-cli-ai-agents-get-multimodal-powers-xcxwbn/
MiniMax Launches MMX-CLI With Multimodal Powers For AI Agents
#AI #MiniMax #MMXCLI #AIAgents #OpenSourceAI #MultimodalAI #AgenticAI #DeveloperTools

#ai #minimax #mmxcli #aiagents #opensourceai #multimodalai
Winbuzzer @[email protected] · 2026-04-02 · 11:41 UTC

https://winbuzzer.com/2026/04/02/zai-launches-glm-5v-turbo-multimodal-vision-model-xcxwbn/
Z.ai Launches GLM-5V-Turbo Multimodal Vision Model
#AI #ZAI #Zhipu #GLM5VTurbo #GLM5VTurbo #ChinaAI #China #LLMs #MultimodalAI #AgenticAI #AIModels #ComputerVision #Glm5 #Openclaw #VisionCodingModel

#ai #zai #zhipu #glm5vturbo #chinaai #china
Winbuzzer @[email protected] · 2026-03-31 · 19:15 UTC

https://winbuzzer.com/2026/03/31/alibaba-qwen35-omni-closed-source-multimodal-ai-xcxwbn/
Alibaba Keeps Qwen3.5-Omni Closed, Breaks Open-Source Streak
#AI #AudioAI #Alibaba #Qwen35Omni #MultimodalAI #OpenSourceAI #Qwen #LLMs #ChinaAI #AlibabaCloud #SpeechSynthesis

#ai #audioai #alibaba #qwen35omni #multimodalai #opensourceai
PPC Land @[email protected] · 2026-03-31 · 10:57 UTC

FYI: Google Search Live goes global: 200+ countries now get voice and camera AI search: Google Search Live expands to all AI Mode markets on March 26, 2026, powered by Gemini 3.1 Flash Live, bringing multimodal voice and camera search to 200+ countries. https://ppc.land/google-search-live-goes-global-200-countries-now-get-voice-and-camera-ai-search/ #GoogleSearch #AI #VoiceSearch #CameraSearch #MultimodalAI

#googlesearch #ai #voicesearch #camerasearch #multimodalai
Winbuzzer @[email protected] · 2026-03-27 · 11:30 UTC

https://winbuzzer.com/2026/03/27/cohere-open-source-transcribe-model-tops-asr-leaderboard-xcxwbn/
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI

#ai #cohere #coheretranscribe #speechrecognition #aitranscription #opensourceai
UKP Lab @[email protected] · 2026-03-25 · 09:50 UTC

🔍 Prof. Kementchedjhieva also discussed alternative approaches to improve vision-to-language alignment while maintaining strong language capabilities.
💬 We thank Prof. Kementchedjhieva for the insightful talk and the discussion with UKP members on multimodal modeling and the future of vision-language systems.
#UKPLab #MultimodalAI #VisionLanguageModels #NLP #GuestTalk #NLProc #MBZUAI #TUDa

#ukplab #multimodalai #visionlanguagemodels #nlp #guesttalk #nlproc
Winbuzzer @[email protected] · 2026-03-24 · 11:59 UTC

https://winbuzzer.com/2026/03/24/luma-ai-uni-1-image-generation-challenges-google-nano-banana-xcxwbn/
Luma AI's Uni-1 Beats Google, OpenAI on Image Benchmarks
#AI #Uni1 #GenerativeAI #AIImageGeneration #LumaAI #TextToImage #MultimodalAI #AIImages #CreativeTools #ImageGeneration

#ai #uni1 #generativeai #aiimagegeneration #lumaai #texttoimage
Winbuzzer @[email protected] · 2026-03-12 · 13:01 UTC

https://winbuzzer.com/2026/03/12/gemini-embedding-2-unifies-text-images-video-in-one-model-xcxwbn/
Gemini Embedding 2 Unifies Text, Images, Video in One Model
#AI #Google #BigTech #GoogleGemini #EnterpriseAI #MultimodalAI #AISearch #AIAudio #AIVideo #AIImages #GoogleAI #GoogleDeepMind #GeminiEmbedding2

#ai #google #bigtech #googlegemini #enterpriseai #multimodalai
Harald Klinke @[email protected] · 2026-03-08 · 08:50 UTC

Der Datasatz „Pico-Banana-400K“ zeigt einen wichtigen Trend in der KI-Forschung: Der Fokus verschiebt sich von Bildgenerierung zu instruktionsbasierter Bildbearbeitung.
Modelle lernen nicht nur Bilder zu erzeugen, sondern gezielt zu verändern – ein Schritt Richtung visuell handelnder Systeme.
https://arxiv.org/abs/2510.19808
#AI #ComputerVision #MultimodalAI #Apple

#ai #computervision #multimodalai #apple
AI Daily Post @[email protected] · 2026-03-04 · 20:57 UTC

Black Forest Labs' new Self‑Flow framework cuts multimodal AI training time by 2.8× versus REPA, thanks to smarter feature alignment and better computational efficiency. Open‑source researchers can now train larger models faster. Dive into the details to see how this could reshape your ML pipelines. #SelfFlow #MultimodalAI #AITraining #ComputationalEfficiency
🔗 https://aidailypost.com/news/black-forest-labs-self-flow-speeds-multimodal-ai-training-28-faster

#selfflow #multimodalai #aitraining #computationalefficiency
AI Daily Post @[email protected] · 2026-03-04 · 20:44 UTC

Microsoft's new Phi‑4 Reasoning Vision 15B packs multimodal reasoning into a compact 15‑billion‑parameter model, delivering low‑latency inference for vision‑language tasks. The paper shows how a tiny model can still reason across images and text, opening doors for open‑source AI on edge devices. Curious? Dive into the benchmarks and see the numbers. #Phi4 #LowLatencyAI #MultimodalAI #CompactModel
🔗 https://aidailypost.com/news/microsofts-phi-4-reasoning-vision-15b-offers-lowlatency-compact-ai

#phi4 #lowlatencyai #multimodalai #compactmodel
AI Daily Post @[email protected] · 2026-02-25 · 10:11 UTC

New AI methods let scientists merge RNA‑seq, imaging and other data, revealing hidden cellular states. This multimodal approach could accelerate discoveries in cell biology and computational biology. Learn how machine learning bridges data integration across experiments. #MultimodalAI #CellBiology #RNAseq #ComputationalBiology
🔗 https://aidailypost.com/news/ai-enables-scientists-integrate-multiple-cell-measurements

#multimodalai #cellbiology #rnaseq #computationalbiology
AI Daily Post @[email protected] · 2026-02-18 · 16:13 UTC

Gemini now lets you conjure music as easily as images or video. The latest upgrade adds Lyria 3, a multimodal AI that composes tracks on the fly, expanding creative possibilities for open‑source artists. Curious how DeepMind’s tools are reshaping generative expression? Read on. #GoogleGemini #MusicGeneration #MultimodalAI #GenerativeAI
🔗 https://aidailypost.com/news/gemini-app-expands-tools-now-generates-music-alongside-images-video

#googlegemini #musicgeneration #multimodalai #generativeai
AI Daily Post @[email protected] · 2026-02-16 · 02:16 UTC

ByteDance rolls out Seedance 2.0, a leap in AI video generation that blends text, audio and motion. The upgrade powers richer multimodal content and has already sparked a rally in its stock. Curious how generative video is reshaping the market? Dive in. #Seedance2 #ByteDanceAI #GenerativeVideo #MultimodalAI
🔗 https://aidailypost.com/news/bytedances-seedance-20-boosts-ai-video-capabilities-fuels-stock-rally

#seedance2 #bytedanceai #generativevideo #multimodalai
AI Daily Post @[email protected] · 2026-02-12 · 15:41 UTC

ByteDance just unveiled Seedance 2.0, a multimodal AI that turns text, images, audio and video into ready‑to‑share clips. It’s the newest challenger to OpenAI’s Sora and Google’s Veo, pushing AI video generation and content creation forward. Curious how it works? Read on. #ByteDance #Seedance2 #MultimodalAI #VideoAI
🔗 https://aidailypost.com/news/bytedance-ai-model-creates-clips-from-text-images-audio-video

#bytedance #seedance2 #multimodalai #videoai
AI Daily Post @[email protected] · 2026-02-11 · 11:27 UTC

xAI’s co‑founder exits keep coming, while Lambda outlines a 2025 shift toward bigger context windows, multimodal reasoning models and open‑source inference for AI production. What could this mean for the future of machine learning? Read on for the full story. #AIProduction #ReasoningModels #MultimodalAI #OpenSourceInference
🔗 https://aidailypost.com/news/xai-co-founder-departures-persist-lambda-outlines-2025-ai-production

#aiproduction #reasoningmodels #multimodalai #opensourceinference
AI Daily Post @[email protected] · 2026-02-09 · 19:47 UTC

ByteDance just launched Seedance 2.0, a new AI video engine that can generate clips from text or images and even follow a reference video as a model. The multi‑modal upgrade promises richer, more controllable video creation for creators and researchers alike. Curious how the reference model works? Dive into the details. #Seedance2_0 #ByteDanceAI #TextToVideo #MultiModalAI
🔗 https://aidailypost.com/news/bytedance-unveils-seedance-20-ai-video-reference-capability

#seedance2_0 #bytedanceai #texttovideo #multimodalai
HackerNoon @[email protected] · 2026-02-03 · 11:35 UTC

Function calling turned LLMs from chatbots into action systems—reshaping AI runtimes, security, reasoning models, and specialization. https://hackernoon.com/ai-in-2026-function-calling-reasoning-models-and-a-new-runtime-era #multimodalai

#multimodalai
HackerNoon @[email protected] · 2026-02-02 · 11:51 UTC

Apache Spark 4.1 introduces declarative pipelines, materialized views, and built-in data quality—reshaping how modern data systems are designed. https://hackernoon.com/youtu-vl-shows-how-treating-vision-as-a-target-unlocks-better-multimodal-ai #multimodalai

#multimodalai
Winbuzzer @[email protected] · 2026-01-29 · 19:57 UTC

https://winbuzzer.com/2026/01/29/deepseek-targets-google-multimodal-ai-search-xcxwbn/
DeepSeek Targets Google with Multimodal AI Search
#AI #DeepSeek #Google #AISearch #AIAgents #SearchEngines #MultimodalAI #GoogleSearch

#ai #deepseek #google #aisearch #aiagents #searchengines