#omniparser — Public Fediverse posts on home.social

Alessio Pomaro @[email protected] · 2025-02-23 · 12:13 UTC

🧠 Come fanno gli #AI Agent come #Operator a eseguire azioni sui browser e su qualunque interfaccia grafica?
👁️ Questo è un esempio di utilizzo di #OmniParser V2 in esecuzione in locale. Il sistema elabora ciò che "vede" nello schermo, e lo converte in dati strutturati che mappano e classificano ogni elemento.
⚙️ Questi dati diventano contesto per un #LLM, che può eseguire operazioni sugli elementi.

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale

#ai #operator #omniparser #llm #genai #generativeai

Alessio Pomaro @[email protected] · 2025-02-23 · 12:13 UTC

🧠 Come fanno gli #AI Agent come #Operator a eseguire azioni sui browser e su qualunque interfaccia grafica?
👁️ Questo è un esempio di utilizzo di #OmniParser V2 in esecuzione in locale. Il sistema elabora ciò che "vede" nello schermo, e lo converte in dati strutturati che mappano e classificano ogni elemento.
⚙️ Questi dati diventano contesto per un #LLM, che può eseguire operazioni sugli elementi.

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale

#ai #operator #omniparser #llm #genai #generativeai

Alessio Pomaro @[email protected] · 2025-02-17 · 07:11 UTC

🧠 #Microsoft ha rilasciato #OmniParser V2: un sistema open source in grado di compiere azioni nell'interfaccia utente.
✨ Non solo sul browser, ma un sistema che usa un #LLM in un Computer Use Agent.

🔗 Il progetto: https://github.com/microsoft/OmniParser

___

✉️ 𝗦𝗲 𝘃𝘂𝗼𝗶 𝗿𝗶𝗺𝗮𝗻𝗲𝗿𝗲 𝗮𝗴𝗴𝗶𝗼𝗿𝗻𝗮𝘁𝗼/𝗮 𝘀𝘂 𝗾𝘂𝗲𝘀𝘁𝗲 𝘁𝗲𝗺𝗮𝘁𝗶𝗰𝗵𝗲, 𝗶𝘀𝗰𝗿𝗶𝘃𝗶𝘁𝗶 𝗮𝗹𝗹𝗮 𝗺𝗶𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://bit.ly/newsletter-alessiopomaro

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale #LLM #AIAgent

#microsoft #omniparser #llm #ai #genai #generativeai

Alessio Pomaro @[email protected] · 2025-02-17 · 07:11 UTC

🧠 #Microsoft ha rilasciato #OmniParser V2: un sistema open source in grado di compiere azioni nell'interfaccia utente.
✨ Non solo sul browser, ma un sistema che usa un #LLM in un Computer Use Agent.

🔗 Il progetto: https://github.com/microsoft/OmniParser

___

✉️ 𝗦𝗲 𝘃𝘂𝗼𝗶 𝗿𝗶𝗺𝗮𝗻𝗲𝗿𝗲 𝗮𝗴𝗴𝗶𝗼𝗿𝗻𝗮𝘁𝗼/𝗮 𝘀𝘂 𝗾𝘂𝗲𝘀𝘁𝗲 𝘁𝗲𝗺𝗮𝘁𝗶𝗰𝗵𝗲, 𝗶𝘀𝗰𝗿𝗶𝘃𝗶𝘁𝗶 𝗮𝗹𝗹𝗮 𝗺𝗶𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://bit.ly/newsletter-alessiopomaro

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale #LLM #AIAgent

#microsoft #omniparser #llm #ai #genai #generativeai

Carlos Mendible :verified: @[email protected] · 2025-02-15 · 19:43 UTC

OmniParser V2: Turning Any LLM into a Computer Use Agent https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/ #OmniParser #Microsoft #GenerativeAI #AI

#omniparser #microsoft #generativeai #ai

Carlos Mendible :verified: @[email protected] · 2025-02-15 · 19:43 UTC

OmniParser V2: Turning Any LLM into a Computer Use Agent https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/ #OmniParser #Microsoft #GenerativeAI #AI

#omniparser #microsoft #generativeai #ai

Erik Jonker @[email protected] · 2024-11-08 · 11:58 UTC

Omniparser, interesting opensource, essential for AI models that want to interpret screens and user interfaces.
https://microsoft.github.io/OmniParser/
#omniparser #ai #opensource #microsoft

#omniparser #ai #opensource #microsoft

Erik Jonker @[email protected] · 2024-11-08 · 11:58 UTC

Omniparser, interesting opensource, essential for AI models that want to interpret screens and user interfaces.
https://microsoft.github.io/OmniParser/
#omniparser #ai #opensource #microsoft

#omniparser #ai #opensource #microsoft

michabbb @[email protected] · 2024-10-26 · 06:47 UTC

🔍 #Microsoft introduces #OmniParser, a new screen parsing module for #GUI interactions:
• Converts UI screenshots into structured elements for improved #AI agent navigation
• Works with #GPT4V to generate precise actions for interface regions
• Achieves top performance on #WindowsAgentArena benchmark

🛠️ Key Components:
• Specialized datasets for icon detection and description
• Fine-tuned detection model for identifying actionable regions
• Captioning model for extracting functional semantics

📊 Performance Highlights:
• Outperforms standard #GPT4V on #ScreenSpot benchmarks
• Compatible with #Phi35V and #Llama32V models
• Functions across PC and mobile platforms without HTML dependencies

🔗 Learn more: https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/

#microsoft #omniparser #gui #ai #gpt4v #windowsagentarena