#universalembodiment — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #universalembodiment, aggregated by home.social.
-
Multi-agentic foundation models are important for #robotics and #automation in negotiated and adversarial places such as #traffic and #warfare.
But how to implement them? I have previously drafted a data-centric architecture for decomposing agentic representations for #UniversalEmbodiment in a GitHub repository.
But LLMs already have internalized multi-agentic representations, why can't we utilize them directly? For example, in text you can easily ask an LLM to describe all the persons or agents present in the scene and their intents.
We can and we must certainly utilize these! But these representations aren't grounded.
What we need to do is to craft robotic foundation model training data to involve scenarios where there are multiple agents present.
First start acausally from what ultimately happened — how was the scenario negotiated between multiple participants, who drove first, what attack and evasive patterns were used?
As we then know what happened, we can go back in time and ask the foundation model to identify all the participants in the feed, and complete their intentions with the information from the ultimate outcome.
The foundation model can then utilize all the language space knowledge it has about multi-agent environments, but also anchor this to visual and control signals present in the training data.
This allows the model to not only answer questions of what each participant intents to do, but also anchor this to multi-modal sensory information, and also project embodiment related control intents to all the participants in the scenario, not only ego.
Ego becomes just a special case in robotic control, the model should learn to generalize to project control intents to all agents present in the data.
Ultimately this allows the foundation model to learn from perceived and projected experiences of others, to learn to imitate or not imitate what it has seen other agents do.
It's all about crafting data, not really about sophisticated model architectures.
#RoboticFoundationModels #FoundationModels #PhysicalAI #AI #AGI
-
It's not really about structured versus unstructured environments for #robots anymore. It's static versus agentic.
Robots in the real world will encounter other agents. Autonomous cars will need to negotiate with all kinds of other road users, including cats, which are everywhere in Spain at least. There was a video from east Asia where an old lady was drying their vegetables on the road and an autonomous car was insistent on driving over them while the lady was trying their best to defend them.
So, for any autonomous robot "in the real world" the true challenge isn't anymore that there are no standard grasping surfaces and items aren't in predefined places. Those are solved problems.
The challenge is in agentic environments where the system needs to understand the other living or at least moving entities and their objectives to appropriately navigate the inherently social situations.
This isn't only about cats trying to trip humanoid robots in stairs. It's also non-living things like fire. Humans model fire psychologically as an entity with an intent. Hence they are evolutionarily adapted to being able to keep a fire burning, or limit its destruction by putting it off.
Human psychology is very Aristotlean in the way it models heavy things "wanting" to go down. Robotic psychology will need similar understanding to be able to negotiate, guide and harness dynamic entities in the world effectively.
For these purposes we will need to replace static world models with agentic world models which properly accommodate non-ego agents and non-ego intents in the world. What's cool about that is that it will also enable a model to learn from third party experience which is always more abundant than ego experience. Monkey-see-monkey-do, or in some cases learn to absolutely not do.
Let's work together in this and surpass the human level in agentic, living environments as well!
-
Why do we need universal embodiment with in-context learning of the embodiment? Because the embodiment isn't fixed. Of course there are the common degradations and even partial mechanical failures, but also imagine:
A humanoid robot sits on a car driver's seat and drives the car; the motor planning and reasoning shouldn't be on the level of turning the steering wheel this many degrees and so on, but on the level of the changed embodiment; now the car.
The same goes with using tools, adapting the embodiment ad-hoc for the purpose, letting the same model design, build, repair and customize embodiments. It is all synergistic and while the current paradigm of embodied AI doesn't aim for the next step yet, we will need to create the next step as well at some point.
Conveniently when we combine this with multi-agent and intent characterization, we will get embodiment adaptation much easier, but we'll also get truly social robotics which are able to negotiate and communicate in the real-world multi-agentic spaces.
-
ARC-AGI doesn't measure intelligence. Intelligence is competence in ridiculously transferrable skills and knowledge. The transfer is bi-directional between different tasks.
If ARC-AGI measured a skill that is ridiculously transferrable, applicable across many diverse topics, LLMs would have learned this skill by learning competence across other kinds of generalist tasks. They didn't.
O3 achieved high scores in these tasks now, probably mostly because they were trained on 75% of the public ARC-AGI benchmark set, allowing it to learn the special skills needed for these tasks.
Since ARC-AGI skills are clearly super special, as in not relevant for anything else, and human-imitative, they do not relate to intelligence at all. It is easy to come up with special tasks invoking special skills which do not apply to any other tasks.
For example, as a contrived example, let's take an arbitrary hash function with an arbitrary seed and produce a sequence of numbers with it. The task is to guess the next number from the previous one. The skill to do this can only apply to this hash function and seed and doesn't generalize or transfer to any other actually useful task.
ARC-AGI is like that, except the hash function is human. This skill has very limited transfer and that is exactly the feature which makes it "difficult" for AIs. If it was a skill that actually means intelligence, it would have been paradoxically learnable by becoming competent in other, unrelated tasks. If it was a truly important skill among all skills related to intelligence, it would have been among the first skills LLMs would have learned as such important, core intelligence skills, are present in almost all tasks.