#multimodal — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #multimodal, aggregated by home.social.
-
ByteDance Open-Sources Lance, a 3B Multimodal Model for Images and Video
https://firethering.com/bytedance-open-source-lance-3b-multimodal-model/#bytedance #tiktok #lance #opensource #ai #news #trending #multimodal #llm
-
ByteDance Open-Sources Lance, a 3B Multimodal Model for Images and Video
https://firethering.com/bytedance-open-source-lance-3b-multimodal-model/#bytedance #tiktok #lance #opensource #ai #news #trending #multimodal #llm
-
ByteDance Open-Sources Lance, a 3B Multimodal Model for Images and Video
https://firethering.com/bytedance-open-source-lance-3b-multimodal-model/#bytedance #tiktok #lance #opensource #ai #news #trending #multimodal #llm
-
ByteDance Open-Sources Lance, a 3B Multimodal Model for Images and Video
https://firethering.com/bytedance-open-source-lance-3b-multimodal-model/#bytedance #tiktok #lance #opensource #ai #news #trending #multimodal #llm
-
ByteDance Open-Sources Lance, a 3B Multimodal Model for Images and Video
https://firethering.com/bytedance-open-source-lance-3b-multimodal-model/#bytedance #tiktok #lance #opensource #ai #news #trending #multimodal #llm
-
@sinabhfuil Definitely, I think so - although a mere Jackeen, I think https://www.bikeshare.ie/ should be in each city in the country, as well as all major towns, and all rail line interchanges.
In this case yesterday, I was in Athlone, which is home to a TUS campus (fka Athlone IT).
There, I had to use a private bike share scheme, and a company I'm not fond of but which appears to have the monopoly on this location.
-
@sinabhfuil Definitely, I think so - although a mere Jackeen, I think https://www.bikeshare.ie/ should be in each city in the country, as well as all major towns, and all rail line interchanges.
In this case yesterday, I was in Athlone, which is home to a TUS campus (fka Athlone IT).
There, I had to use a private bike share scheme, and a company I'm not fond of but which appears to have the monopoly on this location.
-
@sinabhfuil Definitely, I think so - although a mere Jackeen, I think https://www.bikeshare.ie/ should be in each city in the country, as well as all major towns, and all rail line interchanges.
In this case yesterday, I was in Athlone, which is home to a TUS campus (fka Athlone IT).
There, I had to use a private bike share scheme, and a company I'm not fond of but which appears to have the monopoly on this location.
-
@sinabhfuil Definitely, I think so - although a mere Jackeen, I think https://www.bikeshare.ie/ should be in each city in the country, as well as all major towns, and all rail line interchanges.
In this case yesterday, I was in Athlone, which is home to a TUS campus (fka Athlone IT).
There, I had to use a private bike share scheme, and a company I'm not fond of but which appears to have the monopoly on this location.
-
@sinabhfuil Definitely, I think so - although a mere Jackeen, I think https://www.bikeshare.ie/ should be in each city in the country, as well as all major towns, and all rail line interchanges.
In this case yesterday, I was in Athlone, which is home to a TUS campus (fka Athlone IT).
There, I had to use a private bike share scheme, and a company I'm not fond of but which appears to have the monopoly on this location.
-
IA 2026: Áreas de impacto clave 🌍🚀
El impacto hoy se define en tres frentes:
Multimodalidad: La IA ya entiende video, audio y planos técnicos en tiempo real, facilitando diagnósticos y auditorías.Agentes Autónomos: Sistemas como OpenClaw que ejecutan tareas complejas (citas, pagos, código) sin supervisión.
Razonamiento Lógico: Modelos que verifican sus propios pasos, vitales en ciencia y finanzas.
¡La IA ya no solo sugiere, ahora actúa! 🛠️✨
-
The assumption around multimodal AI has mostly been the same. if you want serious capability, you need serious hardware.
MiniCPM-V 4.6 is trying to challenge that idea. It’s a 1.3B parameter multimodal model built to run on phones across iOS, Android, and HarmonyOS, while still handling image understanding, video analysis, OCR, and multi-image reasoning workloads that normally push users toward much larger systems.
https://firethering.com/minicpm-v-4-6-on-device-multimodal-model/
#ai #technews #news #multimodal #opensource #trending -
Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?
https://benjaminhan.net/posts/20260512-interaction-models/?utm_source=mastodon&utm_medium=social
-
Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?
https://benjaminhan.net/posts/20260512-interaction-models/?utm_source=mastodon&utm_medium=social
-
Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?
https://benjaminhan.net/posts/20260512-interaction-models/?utm_source=mastodon&utm_medium=social
-
Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?
https://benjaminhan.net/posts/20260512-interaction-models/?utm_source=mastodon&utm_medium=social
-
Thinking Machines Lab announced research preview of "interaction models", which was trained from-scratch for real-time multimodal collaboration, 200ms micro-turns, audio+video+text+tools concurrent. Their bet: today's chat UX fits "answering inference", not collaboration, so capable AI defaults to autonomous use and looks like labor substitution. Could we change the debate by changing the UI/UX?
https://benjaminhan.net/posts/20260512-interaction-models/?utm_source=mastodon&utm_medium=social
-
🌟✨ OMG, hold the press! The Gemini API File Search is now "multimodal"—whatever that means in techie-speak 🤯. Clearly, the #innovation world is SHOOK by the #groundbreaking ability to search files in more than one way. 🚀 Maybe soon they'll invent a search that can actually find something useful! 😂
https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/ #GeminiAPI #FileSearch #multimodal #technews #HackerNews #ngated -
🌟✨ OMG, hold the press! The Gemini API File Search is now "multimodal"—whatever that means in techie-speak 🤯. Clearly, the #innovation world is SHOOK by the #groundbreaking ability to search files in more than one way. 🚀 Maybe soon they'll invent a search that can actually find something useful! 😂
https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/ #GeminiAPI #FileSearch #multimodal #technews #HackerNews #ngated -
🌟✨ OMG, hold the press! The Gemini API File Search is now "multimodal"—whatever that means in techie-speak 🤯. Clearly, the #innovation world is SHOOK by the #groundbreaking ability to search files in more than one way. 🚀 Maybe soon they'll invent a search that can actually find something useful! 😂
https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/ #GeminiAPI #FileSearch #multimodal #technews #HackerNews #ngated -
🌟✨ OMG, hold the press! The Gemini API File Search is now "multimodal"—whatever that means in techie-speak 🤯. Clearly, the #innovation world is SHOOK by the #groundbreaking ability to search files in more than one way. 🚀 Maybe soon they'll invent a search that can actually find something useful! 😂
https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/ #GeminiAPI #FileSearch #multimodal #technews #HackerNews #ngated -
🌟✨ OMG, hold the press! The Gemini API File Search is now "multimodal"—whatever that means in techie-speak 🤯. Clearly, the #innovation world is SHOOK by the #groundbreaking ability to search files in more than one way. 🚀 Maybe soon they'll invent a search that can actually find something useful! 😂
https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/ #GeminiAPI #FileSearch #multimodal #technews #HackerNews #ngated -
Gemini API File Search is now multimodal
#HackerNews #GeminiAPI #FileSearch #multimodal #AI #innovation #technology
-
Gemini API File Search is now multimodal
#HackerNews #GeminiAPI #FileSearch #multimodal #AI #innovation #technology
-
Gemini API File Search is now multimodal
#HackerNews #GeminiAPI #FileSearch #multimodal #AI #innovation #technology
-
Gemini API File Search is now multimodal
#HackerNews #GeminiAPI #FileSearch #multimodal #AI #innovation #technology
-
Gemini API File Search is now multimodal
#HackerNews #GeminiAPI #FileSearch #multimodal #AI #innovation #technology
-
🤖 Another day, another incomprehensible jargon-filled soup about how machines are getting better at understanding pictures and words. Who knew that blending buzzwords with indecipherable acronyms could make #AI sound like it just discovered fire? 🔥 Let's all pretend we're not terrified by the impending takeover of our #multimodal #overlords. 🚀
https://arxiv.org/abs/2604.26752 #Revolution #Jargon #Overload #TechTrends #HackerNews #ngated -
🤖 Another day, another incomprehensible jargon-filled soup about how machines are getting better at understanding pictures and words. Who knew that blending buzzwords with indecipherable acronyms could make #AI sound like it just discovered fire? 🔥 Let's all pretend we're not terrified by the impending takeover of our #multimodal #overlords. 🚀
https://arxiv.org/abs/2604.26752 #Revolution #Jargon #Overload #TechTrends #HackerNews #ngated -
🤖 Another day, another incomprehensible jargon-filled soup about how machines are getting better at understanding pictures and words. Who knew that blending buzzwords with indecipherable acronyms could make #AI sound like it just discovered fire? 🔥 Let's all pretend we're not terrified by the impending takeover of our #multimodal #overlords. 🚀
https://arxiv.org/abs/2604.26752 #Revolution #Jargon #Overload #TechTrends #HackerNews #ngated -
🤖 Another day, another incomprehensible jargon-filled soup about how machines are getting better at understanding pictures and words. Who knew that blending buzzwords with indecipherable acronyms could make #AI sound like it just discovered fire? 🔥 Let's all pretend we're not terrified by the impending takeover of our #multimodal #overlords. 🚀
https://arxiv.org/abs/2604.26752 #Revolution #Jargon #Overload #TechTrends #HackerNews #ngated -
🤖 Another day, another incomprehensible jargon-filled soup about how machines are getting better at understanding pictures and words. Who knew that blending buzzwords with indecipherable acronyms could make #AI sound like it just discovered fire? 🔥 Let's all pretend we're not terrified by the impending takeover of our #multimodal #overlords. 🚀
https://arxiv.org/abs/2604.26752 #Revolution #Jargon #Overload #TechTrends #HackerNews #ngated -
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
https://arxiv.org/abs/2604.26752
#HackerNews #GLM5VTurbo #Multimodal #Agents #Foundation #Model #AI #Research
-
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
https://arxiv.org/abs/2604.26752
#HackerNews #GLM5VTurbo #Multimodal #Agents #Foundation #Model #AI #Research
-
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
https://arxiv.org/abs/2604.26752
#HackerNews #GLM5VTurbo #Multimodal #Agents #Foundation #Model #AI #Research
-
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
https://arxiv.org/abs/2604.26752
#HackerNews #GLM5VTurbo #Multimodal #Agents #Foundation #Model #AI #Research
-
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
https://arxiv.org/abs/2604.26752
#HackerNews #GLM5VTurbo #Multimodal #Agents #Foundation #Model #AI #Research
-
10 актуальных RAG-подходов: какие реально полезны и когда их применять?
Всем привет, на фоне обновлений в LLM-стеке за последний год, решил собрать практический список RAG-подходов, которые реально используются в продакшене на основе моего опыта и того что я изучал в других кейсах.
https://habr.com/ru/articles/1029616/
#aiразработка #rag_ai #rag_pipeline #retrieval_augmented_generation #llm #llmмодели #vector_search #hybrid_search #graphrag #multimodal
-
10 актуальных RAG-подходов: какие реально полезны и когда их применять?
Всем привет, на фоне обновлений в LLM-стеке за последний год, решил собрать практический список RAG-подходов, которые реально используются в продакшене на основе моего опыта и того что я изучал в других кейсах.
https://habr.com/ru/articles/1029616/
#aiразработка #rag_ai #rag_pipeline #retrieval_augmented_generation #llm #llmмодели #vector_search #hybrid_search #graphrag #multimodal
-
10 актуальных RAG-подходов: какие реально полезны и когда их применять?
Всем привет, на фоне обновлений в LLM-стеке за последний год, решил собрать практический список RAG-подходов, которые реально используются в продакшене на основе моего опыта и того что я изучал в других кейсах.
https://habr.com/ru/articles/1029616/
#aiразработка #rag_ai #rag_pipeline #retrieval_augmented_generation #llm #llmмодели #vector_search #hybrid_search #graphrag #multimodal
-
10 актуальных RAG-подходов: какие реально полезны и когда их применять?
Всем привет, на фоне обновлений в LLM-стеке за последний год, решил собрать практический список RAG-подходов, которые реально используются в продакшене на основе моего опыта и того что я изучал в других кейсах.
https://habr.com/ru/articles/1029616/
#aiразработка #rag_ai #rag_pipeline #retrieval_augmented_generation #llm #llmмодели #vector_search #hybrid_search #graphrag #multimodal
-
https://www.europesays.com/es/512266/ Hasta el 40 de los casos de demencia podrían prevenirse actuando sobre factores de riesgo #abordaje #actuando #casos #clínica #cognitivo #consenso #demencia #deterioro #documento #ES #España #factores #hábitos #Health #incorporacion #intervencion #multimodal #nutrición #podrían #prevenirse #propone #retrasar #riesgo #Salud #SEN #Spain #vida
-
Heute war ich bei einer Familienfeier in einem Restaurant in Gutmadingen. An einem Sonntag. Zum Mittagessen und Kaffee trinken. Mit dem ÖPNV nicht zu erreichen.
Fahrrad leider auch keine Option weil ca. 150 km und eine feine Familienfeier mittendrin passt nicht. Wettervorhersage war regnerisch.
Der Kostenrechner von @nes rechnet einen Preis von 54 € aus. Kleinstes #eAuto weil ich alleine unterwegs bin. Nicht schlecht.
Allerdings habe ich ein #Deutschlandticket
🤔💭
Bin schlussendlich mit der :diebahn: von #Freiburg nach #Donaueschingen gefahren und habe mir da ein #eCarsharing genommen für die letzten km.Ergebnis: 15,53 € Carsharing Kosten + Zeit zum Lesen im 🚈
Einsparung: 38,47 € 💶🤑💰
#multimodal #intermodlitat #Carsharing -
#Qwen36 35B-A3B, a new #opensource MoE model with 35 billion parameters, showcases exceptional #agentic #coding performance and strong #multimodal perception and #reasoning abilities. It outperforms its predecessor and rivals larger models, making it a versatile choice for various tasks. https://qwen.ai/blog?id=qwen3.6-35b-a3b #tech #media #news
-
#Qwen36 35B-A3B, a new #opensource MoE model with 35 billion parameters, showcases exceptional #agentic #coding performance and strong #multimodal perception and #reasoning abilities. It outperforms its predecessor and rivals larger models, making it a versatile choice for various tasks. https://qwen.ai/blog?id=qwen3.6-35b-a3b #tech #media #news
-
#Qwen36 35B-A3B, a new #opensource MoE model with 35 billion parameters, showcases exceptional #agentic #coding performance and strong #multimodal perception and #reasoning abilities. It outperforms its predecessor and rivals larger models, making it a versatile choice for various tasks. https://qwen.ai/blog?id=qwen3.6-35b-a3b #tech #media #news
-
#Qwen36 35B-A3B, a new #opensource MoE model with 35 billion parameters, showcases exceptional #agentic #coding performance and strong #multimodal perception and #reasoning abilities. It outperforms its predecessor and rivals larger models, making it a versatile choice for various tasks. https://qwen.ai/blog?id=qwen3.6-35b-a3b #tech #media #news
-
#Qwen36 35B-A3B, a new #opensource MoE model with 35 billion parameters, showcases exceptional #agentic #coding performance and strong #multimodal perception and #reasoning abilities. It outperforms its predecessor and rivals larger models, making it a versatile choice for various tasks. https://qwen.ai/blog?id=qwen3.6-35b-a3b #tech #media #news
-
Как я сделал Claude мультимодальным, подключив к нему Qwen Omni
Claude слепой. К сожалению ни одна модель Антропиков не работает напрямую с видео. Да, можно нарезать хоть на каждый кадр и скормить ему, но это не то. Контекст движения теряется, а без него это просто разбор кучи кадров на составляющие и попытка собрать контекст воедино. Для меня как для визуального артиста это большая боль, потому что часто хочется отправить видео-рефы и попросить разобрать движение камеры, персонажа, дизайн в конце концов. И вот конкретная задача - 29 сгенерированных видео-референсов анимации персонажа лежат в папке проекта, надо их разобрать по категориям и описать каждое движение. Вручную мне заниматься этим, конечно же, лень. Час-полтора времени на нудную задачу. Тогда я вспомнил про Qwen Omni, которым уже пользуюсь для создания Цифрового риалтайм персонажа-ассистента. И подумал, а почему бы не подружить их.
-
Как я выбираю моменты для Shorts: почему LLM + транскрипт почти всегда дают мусор
Это третья статья про мой "аниме завод" — систему, которая автоматически превращает длинные эпизоды в Shorts. Если хотите полный контекст, вот предыдущие части:
https://habr.com/ru/articles/1021552/
#llm #shorts #python #cv #computer_vision #signal_processing #multimodal #transcript #youtube_shorts #ai