#universalembodiment — Public Fediverse posts on home.social

Tero Keski-Valkama @[email protected] · 2026-03-04 · 07:58 UTC

One of the harder problems about robotic embodiments is safety. How to guarantee standard-compliant and effective guardrails for generalist robots which are mobile and not limited in the tools they can use?

For example, it is practical to install light curtains for industrial robots to prevent anyone from getting into their working area when they are active.

But for mobile robots, they can be anywhere, and you can't build a safe operating space for them.

Even if your robot is weak in its joints and has no sharp corners, all bets are off once it grabs a power tool, or sits onto a driver's seat of a car.

It requires a paradigm shift in safety. You aren't actually trying to limit the robot movement in a classical sense, but you're trying to make it act in a way that prevents harm from happening. In many cases this might involve actual movement rather than stopping movement. Sometimes it requires limiting something outside the robot from happening, for example, if something heavy is about to fall down in a dangerous fashion, the robot should try to stop it.

This is of course against the strictly defined rules we have from classical robotic safety methods, but the reason is that those kinds of limited operating envelopes won't make generalist mobile robots safe.

There are many rationales for static safe constraint envelopes for robots, for example, if a robot malfunctions, it shouldn't crush anything to death. There are still places for such constraints, but they aren't enough, and trying to approach the safety challenge with only these kinds of methods as the only tools in the toolbox won't lead to a success.

The robotic safety systems shouldn't only care about the physical malfunctions of the robot itself, but also malfunctions of other things. For example, if a humanoid robot is preparing food, there might be a food oil fire, and instead of just stopping the robot should put it out.

In general robots should be robust against both degradations and extensions of their embodiments to be able to function robustly in the open environment. This alone should in itself be a solid protection against physical malfunctions. If a robot can walk after having lost one leg, it should also function within reason, without causing danger, if one of its servos get stuck active.

While hierarchies and layers create robust safety, the highest embodied control layer itself should be made safe, and it shouldn't lean on lower constraint envelopes to produce the safety.

The robot must not step on a cat, or cause a cat to be harmed by inaction. If your robotic safety framework ceases to apply when the robot picks up a power tool, or presses the button to activate data center halon extinguishers, it's not framed correctly.

#AI #robotics #UniversalEmbodiment #OpenToWork

#ai #robotics #universalembodiment #opentowork

Tero Keski-Valkama @[email protected] · 2026-02-18 · 11:33 UTC

One reason why LLMs are so powerful is that they are not only world models, they are multi-agentic world models.

What does it mean?

It means that they learn to imitate the behavior of all the agents and pseudo-agents (fire, water, ...) in the data. Ego (assistant) is just a special case for them. They learn from third-party experience, and they are able to roleplay, or let's say "embody", universally for any kind of an agent you set up as the ego.

So, robotics foundation models are typically not trained similarly. They are typically only able to learn from ego experience. This makes them very fragile and bad at learning skills.

To get to #UniversalEmbodiment in #robotics, you need to reframe the learning in robots in an analogous fashion to LLMs. You must make the ego a special case, not the only point of view.

If you need help in this, I am an #AI generalist with over 25 years of experience, currently #OpenToWork.

Let's chat!

#universalembodiment #robotics #ai #opentowork

Tero Keski-Valkama @[email protected] · 2026-01-27 · 11:13 UTC

Multi-agentic foundation models are important for #robotics and #automation in negotiated and adversarial places such as #traffic and #warfare.

But how to implement them? I have previously drafted a data-centric architecture for decomposing agentic representations for #UniversalEmbodiment in a GitHub repository.

But LLMs already have internalized multi-agentic representations, why can't we utilize them directly? For example, in text you can easily ask an LLM to describe all the persons or agents present in the scene and their intents.

We can and we must certainly utilize these! But these representations aren't grounded.

What we need to do is to craft robotic foundation model training data to involve scenarios where there are multiple agents present.

First start acausally from what ultimately happened — how was the scenario negotiated between multiple participants, who drove first, what attack and evasive patterns were used?

As we then know what happened, we can go back in time and ask the foundation model to identify all the participants in the feed, and complete their intentions with the information from the ultimate outcome.

The foundation model can then utilize all the language space knowledge it has about multi-agent environments, but also anchor this to visual and control signals present in the training data.

This allows the model to not only answer questions of what each participant intents to do, but also anchor this to multi-modal sensory information, and also project embodiment related control intents to all the participants in the scenario, not only ego.

Ego becomes just a special case in robotic control, the model should learn to generalize to project control intents to all agents present in the data.

Ultimately this allows the foundation model to learn from perceived and projected experiences of others, to learn to imitate or not imitate what it has seen other agents do.

It's all about crafting data, not really about sophisticated model architectures.

#RoboticFoundationModels #FoundationModels #PhysicalAI #AI #AGI

#robotics #automation #traffic #warfare #universalembodiment #roboticfoundationmodels

Tero Keski-Valkama @[email protected] · 2026-01-01 · 11:27 UTC

It's not really about structured versus unstructured environments for #robots anymore. It's static versus agentic.

Robots in the real world will encounter other agents. Autonomous cars will need to negotiate with all kinds of other road users, including cats, which are everywhere in Spain at least. There was a video from east Asia where an old lady was drying their vegetables on the road and an autonomous car was insistent on driving over them while the lady was trying their best to defend them.

So, for any autonomous robot "in the real world" the true challenge isn't anymore that there are no standard grasping surfaces and items aren't in predefined places. Those are solved problems.

The challenge is in agentic environments where the system needs to understand the other living or at least moving entities and their objectives to appropriately navigate the inherently social situations.

This isn't only about cats trying to trip humanoid robots in stairs. It's also non-living things like fire. Humans model fire psychologically as an entity with an intent. Hence they are evolutionarily adapted to being able to keep a fire burning, or limit its destruction by putting it off.

Human psychology is very Aristotlean in the way it models heavy things "wanting" to go down. Robotic psychology will need similar understanding to be able to negotiate, guide and harness dynamic entities in the world effectively.

For these purposes we will need to replace static world models with agentic world models which properly accommodate non-ego agents and non-ego intents in the world. What's cool about that is that it will also enable a model to learn from third party experience which is always more abundant than ego experience. Monkey-see-monkey-do, or in some cases learn to absolutely not do.

Let's work together in this and surpass the human level in agentic, living environments as well!

#UniversalEmbodiment #RoboticFoundationModels #AI

#robots #universalembodiment #roboticfoundationmodels #ai

Tero Keski-Valkama @[email protected] · 2025-12-21 · 17:50 UTC

Why do we need universal embodiment with in-context learning of the embodiment? Because the embodiment isn't fixed. Of course there are the common degradations and even partial mechanical failures, but also imagine:

A humanoid robot sits on a car driver's seat and drives the car; the motor planning and reasoning shouldn't be on the level of turning the steering wheel this many degrees and so on, but on the level of the changed embodiment; now the car.

The same goes with using tools, adapting the embodiment ad-hoc for the purpose, letting the same model design, build, repair and customize embodiments. It is all synergistic and while the current paradigm of embodied AI doesn't aim for the next step yet, we will need to create the next step as well at some point.

Conveniently when we combine this with multi-agent and intent characterization, we will get embodiment adaptation much easier, but we'll also get truly social robotics which are able to negotiate and communicate in the real-world multi-agentic spaces.

#UniversalEmbodiment #RoboticFoundationModels

#universalembodiment #roboticfoundationmodels

Tero Keski-Valkama @[email protected] · 2025-12-20 · 06:04 UTC

One thing to understand about physical foundation models or robotic foundation models is in-context learning.

You should aim to frame the problem and the data in a fashion where the model can learn to control the embodiment in-context, rather than training it without a possibility to calibrate and discover where it is in the start of the session.

Otherwise you won't get truly universal models, but models which constantly hedge their bets and are forced to make their control signal not only generalist, but generalist across all training worlds and embodiments *at the same time*.

This means that you'll be stuck in a frame where you will need a control adapter layer separately trained per embodiment, because the foundation model is incapable of discovering in-context what it inhabits, so its outputs are by necessity the kind that should work somewhat ok for all possible worlds.

The model also becomes unable to learn embodiment-specific control policies without hacks.

I believe the fact that people don't realize they need to consider in-context learning for these foundation models for embodiment calibration is a root of many practical problems down the line.

#PhysicalFoundationModels #UniversalEmbodiment #robots #FoundationModels

#physicalfoundationmodels #universalembodiment #robots #foundationmodels

Tero Keski-Valkama @[email protected] · 2024-12-25 · 10:57 UTC

Why are these kinds of robots suddenly possible? From the perspective of mechatronics, better batteries and higher compute embedded computers are enablers.

Otherwise it's mostly about utilizing deep learning models to enable robotic control algorithms to encroach towards more difficult unstructured environments, cheaper embodiments, and less well-defined skills.

It's old tech, reinforcement learning and Sim2Real.

It looks like these robots can do anything, right? Yes and no. As they are already, they are insanely useful for all kinds of domains of robotic autonomy. Their competence in a low number of skills is clearly above human controller and in general above human just by training them with Sim2Real methods and reinforcement learning.

There are limitations to those methods though. Reinforcement learning is extremely data inefficient, and simulations need to be hand-crafted accurately for all kinds of different challenges. That's why you typically see these kinds of robots in uniform robotic embodiments and presenting only a low number of tasks, pretty much ~10 skills.

These aren't generalist AI. The same control models cannot be deployed to different kinds of robots, and they need to be retrained for any small upgrade in the robot model. Retraining is extremely expensive, not only in compute, but also in human labor in defining the simulated training environments, and target tasks, including their reward functions.

It's possible to achieve and exceed this level of performance with generalist AI though, like has happened across many domains before. Generalist models in the end win specialist models for many reasons.

Generalist models also don't need to be retrained for every new embodiment and skill, but they can do these zero-shot fashion, live in-context.

#UniversalEmbodiment

https://youtube.com/watch?v=X2UxtKLZnNo&si=UFDvM9So7jgMyxdh

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-12-21 · 13:22 UTC

ARC-AGI doesn't measure intelligence. Intelligence is competence in ridiculously transferrable skills and knowledge. The transfer is bi-directional between different tasks.

If ARC-AGI measured a skill that is ridiculously transferrable, applicable across many diverse topics, LLMs would have learned this skill by learning competence across other kinds of generalist tasks. They didn't.

O3 achieved high scores in these tasks now, probably mostly because they were trained on 75% of the public ARC-AGI benchmark set, allowing it to learn the special skills needed for these tasks.

Since ARC-AGI skills are clearly super special, as in not relevant for anything else, and human-imitative, they do not relate to intelligence at all. It is easy to come up with special tasks invoking special skills which do not apply to any other tasks.

For example, as a contrived example, let's take an arbitrary hash function with an arbitrary seed and produce a sequence of numbers with it. The task is to guess the next number from the previous one. The skill to do this can only apply to this hash function and seed and doesn't generalize or transfer to any other actually useful task.

ARC-AGI is like that, except the hash function is human. This skill has very limited transfer and that is exactly the feature which makes it "difficult" for AIs. If it was a skill that actually means intelligence, it would have been paradoxically learnable by becoming competent in other, unrelated tasks. If it was a truly important skill among all skills related to intelligence, it would have been among the first skills LLMs would have learned as such important, core intelligence skills, are present in almost all tasks.

#UniversalEmbodiment #AI #ARCAGI #AGI #LLMs

#universalembodiment #ai #arcagi #agi #llms

Tero Keski-Valkama @[email protected] · 2024-12-12 · 12:03 UTC

When and why do generalist systems outperform specialist systems? This isn't a law of nature and indeed there are cases where tasks are so distinct that two specialist systems will always outperform a generalist system aiming to perform well in two tasks.

Basically generalist systems outperform the specialist systems based on a couple of aspects:
- Amount and quality of data, and
- Representational capacity.

Specifically, if we have two completely distinct tasks, there are no shared features between the two datasets, and a generalist system would need at least the sum of the representational capacity of two separate models. In such cases, the two tasks will just confuse and confound the performances of each others during training, and at the very least the training of a generalist model would take longer than training two specialist models, and it would become heavier for inference than running either of the specialist models alone.

In such cases generalist models don't make much sense. Typically these are cases with very few clearly defined and distinct tasks, for example, "detect cats from an image with bounding boxes", and "predict the stock value time series".

On the other end of the spectrum we have very identical tasks, where a generalist model would be pretty much the same as either one of the two specialist models in representational capacity, but since the data are completely interchangeable, the generalist model can be trained with double the volume of data compared to each of the two specialist models making it sovereignly better. For example, one task of "predict stock value time series for a random half of the stocks", and "predict the same for the other half".

There is a spectrum of sets of tasks in between, and with less well-defined tasks their domains tend to overlap a lot, for example in things like assistant bots for ISP problem solving and for buying computers, and of course for general multi-purpose assistants.

When we keep developing more and more multi-purpose, or generalist AIs, the tasks they will be trained on and ultimately used to perform form a continuity where you could pick two example tasks which are very much distinct and without any overlap, but taking in all the tasks the models are expected to perform, these are actually just single points in a continuous distribution of potential tasks, which in aggregate do have a heavy overlap.

The overlap is power of generalist models over specialist ones and takes the form of task data which benefits the performance of other tasks as well. This means that the data involves transferrable skills and transferrable knowledge.

And even if in special cases some skills and knowledge have limited transfer, in aggregate they form a synergistic, continuity of skill transfer and basis for training data efficiency.

#DeepLearning #AGI #ML #UniversalEmbodiment

#deeplearning #agi #ml #universalembodiment

Tero Keski-Valkama @[email protected] · 2024-12-07 · 12:11 UTC

Why and when is synthetic data better than real data for ML training?

It's not only a question of availability volume, although in the past that was an important consideration.

In training data we want to have:
1. Knowledge which is transferrable to the target task, or generally, in high fidelity.
2. Skills which are generalizable to the target task, or generally, in high fidelity.
3. Both represented in a way that allows instruction or control of the trained model, typically instruction-following form.

Can we get better synthetic data than a real world data is? That depends on our models actually. If our models do not yet understand the skills needed, they won't be able to practice those skills to become better in them. If they lack knowledge, they cannot by themselves acquire that knowledge without input from the real world, whether by literature or by active experimentation.

For some relatively generalist skills we already have frontier models which have acquired a bootstrappable level of competence in those skills, and indeed understand what those skills are about, to be able to improve above human level by autonomous practice.

The knowledge pool trained to our generalist large language models or large multi-modal models is already vast, impressively above human-level in most topics.

Of course in new modalities like medical imagery, and robotic control, both the competence in skills and required knowledge are still lacking in vanilla frontier models, but these can be easily trained to those models by imitation and self-supervised learning.

Once the models achieve the bootstrappable level of competence in a new domain, they will become able to self-improve by exercising the related skills and evaluating their own performances. In practice this becomes a process of recursive self-improvement by training data refinement and synthesis.

We already have a clear engineering roadmap to surpass human level in all domains, one by one, and the progress won't take steps backwards. Knowledge and skills from other domains transfer to new domains and make this process easier and faster for every novel domain.

Now, consider a world where this process has reached its conclusion.

#RecursiveSelfImprovement #UniversalEmbodiment #LLMs #AI #AGI

#recursiveselfimprovement #universalembodiment #llms #ai #agi

Tero Keski-Valkama @[email protected] · 2024-11-08 · 11:00 UTC

Human-level isn't the goal, and human-sourced data isn't ground truth.

Realistic simulations aren't the goal either, what we need in training embodiments and environments is the most effective ways to learn ridiculously generalizable skills and knowledge.

The real world isn't the optimal environment for that; there's not much sense in pursuing realism as a golden goal, as many do.

Realism as a goal is an asymptotic limited by diminishing returns exactly the same as human-level as a goal. We need to get over that.

We need to pursue super-realism instead — faster-to-run gyms which present the skills and knowledge we want the AIs to learn more effectively than the real world and realistic simulations do.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-11-07 · 15:13 UTC

I have written previously about simulations and generalist AI, and how many people have gotten a misguided impression that photorealism and physical realism is the goal.

Pushing realism forward is subject to diminishing returns and causes the following:
- The simulations become inefficient which means the volume of useful information you're feeding to your trained models becomes smaller, while the volume of total data becomes larger.
- Paradoxically you start overfitting on specific types of sensors. Remember that your cameras and microphones aren't photo-/sonorealistic. Your realistic data will always have less coverage than unrealistic data which spans a much larger distribution both in variance and in scale.
- You become bogged down with simulating ever more nuanced aspects of things, like haptic feedback on different kinds keyboards for your robot training.
- Resources are being spent inefficiently, and it becomes prohibitive in time and energy to go deeper into abstract skills which still need to be applied in the physical world. If you train your robot to program computers with a haptically realistic keyboard, it won't do much better than the monkeys trying to produce Shakespeare's works.

So, what's the solution? Photo-/sono-/haptic realism has its place, but not quite how everyone thinks. The physical world isn't the best training environment for anything. We can do much better with less.

For any skills, the challenges should be posed which are more difficult and more variable than the real world tasks. Not exactly as difficult and exactly as variable, because those targets lead to asymptotically approaching those with skyrocketing costs.

Yet, the environments and the embodiments should be minimal for the skills to learn, to be able to produce and use as much salient data as possible. There needs to be curriculums of "simulations" spanning simple digital games to text to different generative modalities, to actual moving and working robots in the real world. Otherwise the physical manipulation skills do not connect to the abstract skills, and training the abstract skills becomes prohibitively costly. Even we humans play games of different kinds to exercise our skills.

Realism isn't the goal, super-realism is, and the road to that goes through surrealism, and drawing the analogue between what we call simulations and games, and embodiments.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-10-26 · 09:08 UTC

In synthetic data used to train AIs, realism isn't the goal.

By synthetic data I also mean simulations, games, and all sorts of constrained embodiments.

What is the goal then?

What we want the AI to learn are actually transferrable skills and knowledge which apply in our target use. If these skills have a general applicability, they are called "ridiculously transferrable skills", and competence in them is one definition of intelligence.

When realism isn't the goal, we immediately see that we can actually synthesize better data than the real-world has!

Better data means any data that is more suitable and effective in training these kinds of transferrable skills and knowledge. This can mean anything from making the tasks harder, having better ground truth, structuring the tasks in better curriculums, or in general presenting the skills we want to train in a more efficient fashion.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-10-13 · 15:02 UTC

What's so difficult about reinforcement learning? Why is it so data inefficient?

One main issue is reward assignment. Classic reinforcement learning algorithms assign rewards backwards over sequential frames of experience. This model came from toy games like tik-tac-toe, where the frames correspond to discrete board states, where this process works reasonably well because the games are simple.

It starts failing with long-term dependencies where an action in the deeper past caused the reward or the penalty in the present, and the agent needs to return this reward across a huge number of frames and decisions, and they all confound together. The agent needs an insane number of repetitions for correlations to emerge from this backwards-exponential soup.

That is the root issue in reinforcement learning. Well, one of them. Another one is that the reward signal is information-poor and sparse, is often the sole source of world relevance information, and so doesn't correspond well with how animals learn.

The reward assignment becomes problematic already in games like chess and go which have long-term dependencies, but especially difficult in games like Starcraft or Counterstrike, and almost totally infeasible in the real world robotics applications.

But how do humans do it? How can humans learn tasks like bowling, with many many millisecond frames between the act of throwing the ball, and the result?

Humans don't model the world as movies composed out of sequential frames at all. They don't return reward across all those frames once the pins fall. They work by association. A human sees the pins fall, and then by association goes back to past decisions and thoughts associated with this result event. The reward isn't neutral, it has associative links to past key episodes. And the reward/penalty is trivially associated to these stored episodic memories.

Why is it so hard to do this correctly in reinforcement learning systems? Because they are stuck in dogma. Students are taught the Bellmann equations and that Sutton and Barto defined these problems like so, and are incapable of questioning these.

Instead of sequential, synchronous world state frames, we need to frame this problem as an associative problem, where the rewards are returned to the associated episodic memories.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-10-03 · 08:59 UTC

When we move into #AGI world and slowly into the following AGI+embodiment world, there will be huge socioeconomic disruptions, and a lot of pain.

To decrease this painful economic transition period marked by mass unemployment and populist policies scapegoating AI, we must make the transition from digital AGI into physical, embodied AGI scale up as fast as possible, so that we can guarantee universal wellbeing to people.

#UniversalEmbodiment

#agi #universalembodiment

Tero Keski-Valkama @[email protected] · 2024-09-21 · 10:38 UTC

I am forming a way of thinking of AI training which bridges all these RL game playing agents, world models, sim2real and LLM agents.

Hear me out: There are no simulations. Any digital environment is always run in this physical cosmos and is therefore a projection of reality, not something that emulates reality. This sounds philosophical but enables us to see how to bridge all the concepts mentioned under a single framework.

What else are projections of the reality? Embodiments. With their constrained senses, affordances and locality in the reality, they determine the objectives and the ways to achieve them.

So, let's take an example of a chess robot. It plays chess with real chess pieces, but it's embodied in a machine which interfaces with the environment through the chess board abstraction. This agent is embodied in the real world, but only sees the state of the game board and can only make legal moves. Its embodiment is the game.

Compare this to a chess AI in a computer. It sees exactly the same world! It is in reality also embodied in the same physical substrate, its board is just now pixels on a screen, which are also physical entities.

Both of these seemingly solid abstractions leak. If these AI agents were aware of where they are located in the larger scheme of things, they could manipulate the real world. Both of them could try to communicate with humans through chess moves. Both could make the processors cooler or warmer. The one in the computer could use different board positions to light up the room through pixels differently. The one in the chess robot could maybe cause havoc by making moves that can tip a precariously placed glass of water.

So, it's not about unrealistic digital environments, realistic simulations and the real world put against each others at all. It is all just different embodiments which determine how the AI agent sees the real universe, what its objectives and affordances are, and where it is located in the larger scheme of things.

This allows us to draw a mesh of bridges between these seemingly contrasting concepts and build AI agents which traverse between these realms effortlessly.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-08-30 · 16:43 UTC

The next steps towards and through #AGI go roughly by these lines:

1. We already have generalist LLMs with a bootstrappable competence, so these are set up to recursively self-improve over real world problems and domains.
2. We don't yet have such for all modalities, so generalist, instructable multi-modal models are set up in clever ways to mine and refine multimodal data which presents bootstrappable level of competence in understanding what the new modalities contain.
3. Those data are used to train multi-modal generalist models to a bootstrappable competence and they are set up to recursively self-improve.
4. We will assimilate new modalities and domains in such a fashion and automate the assimilation. We will have universalist, generalist models which are continuously recursively self-improving.
5. These intelligences induce a terrible demand on the physical world interfaces because they will be used to solve real world problems.
6. We will build "gyms" or machine academies where these machines are taught and learn to embody dynamic, mechanical machines of all sorts.
7. Universalist embodied AIs will be trained to design new machines, new embodiments, within practical constraints and matching their objectives.
8. These AIs are taught to use their embodiments to build their designs using available materials and tools opportunistically.
9. Machine colonization of the solar system objects, radical detachment from Excel managed economies.

#UniversalEmbodiment

#agi #universalembodiment

Tero Keski-Valkama @[email protected] · 2024-07-16 · 09:57 UTC

Many of my separate AI research proposals/manifestos/proof-of-concepts which initially started as separate ideas have later become interconnected.

I am currently working on my spare time on the Universal Embodiment which aims to utilize the rich agentic relationships in the real world data to boost reinforcement learning. Basically noting that the living world is an ocean of agency, and #ML agents should be able to utilize this by recognizing they are themselves only a special case of all the other agents observable in the world:
https://github.com/keskival/universal-embodiment

This ties into my earlier work with embodiment, of embodying an #LLM, in practice Isaac Newton's personality construct to control a small game of physics, Embodied Emulated Personas:
https://github.com/keskival/embodied-emulated-personas

The #UniversalEmbodiment project is also data centric, recognizing that it all must eventually come down to data, even if cognitive architectural innovations are involved. The Recursive Self-improvement Suite produces traces of task performances and continually evaluates them by ranking, to produce continually self-refining data which can be used to continual train and achieve recursive self-improvement.
https://github.com/keskival/recursive-self-improvement-suite

Universal Embodiment project is also about inductive biases, learning rules which LLMs have also learned through massive amounts of data, but implemented as proper native architectures for efficiency. This is something really interesting which I worked on a bit previously in King Algorithm Manifesto, to recognize and extract learning rules from trained LLMs. These can be used to mine and implement useful inductive biases from large models into smaller and more efficient ones. But also, for making large models which are incomparably superior to the current large models.
https://github.com/keskival/king-algorithm-manifesto

This leads us to algorithmic development in #AI. Current #DeepLearning systems are synchronous and based on dense tensor operations. We could make it much better by incorporating asynchronicity and sparse processing, at least as far as power use is concerned. We might be able to mine these asynchronous algorithms from within trained LLMs, and implement them natively on asynchronous substrates. Sleeping Machines repository documents a lot of thinking about how such asynchronous architectures could use delays and orderings of events as fundamental primitives of computation:
https://github.com/keskival/sleeping_machines

#ml #llm #universalembodiment #ai #deeplearning

Tero Keski-Valkama @[email protected] · 2024-07-14 · 14:57 UTC

Reasoning is an inductive bias of #AI architectures; it gives the model good tools to use to explain what it is seeing.

Inductive biases are important to let models derive more from the same amount of data. They all depend on some heuristics that tend to apply. Reasoning for example depends on a continuing world belief; that true facts derived from the past continue to apply in the future as well, and can be used to predict what happens.

Neural networks are capable of learning these types of biases by meta-learning, but it takes a lot of data to do that; meanwhile that same data could be used to learn deeper skills and deeper knowledge instead.

#UniversalEmbodiment I am working on is about a different, synergistic inductive bias which is especially important for reinforcement learning; a belief that all agents are similar, except in different bodies, different situations and having different goals and minds.

This is a very important bias to have to learn robotic control in complex environments. It allows agents to learn from the experience of others, and deem other agents as things to try to negotiate with, to communicate with, or at least recognize if goals of some agents are in conflict with the goals of oneself.

It is a cognitive tool which allows learning agentic control with vastly less trial and error, but also empathy.

#ai #universalembodiment

Tero Keski-Valkama @[email protected] · 2024-07-02 · 21:43 UTC

The Universal Embodiment structure is slowly taking form.

To impose a multi-agentic inductive bias to an autoregressive token sequence Transformer for robotic control, we need to introduce token types, and latent tokens.

Latent tokens such as agent actions cannot be directly observed from the environment, but have a role in explaining the observed environmental signals.

Roles of the latent tokens are maintained by:
- Selective masking of past tokens to constrain the information flow appropriately,
- information bottlenecks in the model architecture such as low dimensional action representations like in Google Genie,
- structured attention modules which maintain e.g. agent persistence over time, and
- appropriate regularizations.

In principle it's just a cognitive architecture molded over an autoregressive token sequence Transformer architecture with an engineered inductive bias.

The inductive bias aims to explain the observation dynamics causally with multiple agents striving towards their separate goals. Agent representations are destructured into agents, intents and actions so that we can plug in ego control into it for #UniversalEmbodiment.

https://github.com/keskival/universal-embodiment/blob/main/README.md

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-06-08 · 22:57 UTC

Another way to define #UniversalEmbodiment is a system of generative processes which decompose sequences of observations into explaining sets of agents, their intents and actions, as intentional sequences, which allows learning from the experience of others, generalization of skills across different embodiments, and controllability of in-context-learned control models.

universal-embodiment/README.md at main · keskival/universal-embodiment https://github.com/keskival/universal-embodiment/blob/main/README.md

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-06-08 · 10:51 UTC

Starting to collect everything about #UniversalEmbodiment to this GitHub project:
https://github.com/keskival/universal-embodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-05-17 · 22:20 UTC

It would intuitively feel efficient to generate machine code directly with #LLMs. There are lots of challenges in that though.

Machine code isn't created in a vacuum, but is designed to support common programming patterns like stack, heap, threads, memory management, caching, to make typically C compilers as effective as possible. If this wasn't the case, there would possibly be a larger difference between the performance of optimal C code, and direct machine code.

Maybe if we had architectures further away from these legacy computing patterns, such as neuromorphic chips this can become a more promising avenue.

But the challenges: High-level programming languages aren't only amenable to humans but also to LLMs; the constructs in high-level languages are closer to requirements and plans laid out in natural language. Concise expression allows more efficient use of context windows. Inherent hierachical representations allow better modifiability (not unlike with genetics where the hierarchical structure spawns from embryonic development).

How would you train the bridging between the real world requirements and needs to machine code, if not by existing programming languages and compilers? If you use existing compilers, are you actually just creating a worse imitation of them? How do you frame the training scheme so that it makes sense?

#UniversalEmbodiment

#llms #universalembodiment

Tero Keski-Valkama @[email protected] · 2024-05-05 · 14:10 UTC

What is instruct-tuning and instruction following actually?

It feels like a pragmatic hack lacking theoretical framework. Is instruction following just some special case out of some general set of skills? If so, is it the best way to do this?

If you look at this more generally in the space of auto-regressive token sequences, where the sequences are imitative to arbitrary observed data, what does it mean to learn to follow instructions and what instruction following actually is?

In language sequences, we intuitively have a sense what instructions are; they are like commands, injecting purpose, goals, and constraints to facilitate a performance of a task.

It allows control.

Instruction following in this domain is a skill, presented in free text language, which AI can learn from the presentations and then perform. But it's a highly special skill which allows controllability, interrogability and evocation of all other skills AI knows.

What about other modalities? In robotics, there are somewhat analogous signals with a control aspect mixed in, for example by presenting a robotic control AI an image of a goal state, so that the robot can plan appropriate movements to achieve that objective.

In principle it would also be possible to train a robot to follow instructions given in image signals by rasterized text, or for example as traffic signs.

Are the instructions always from humans? Or can a cave invite an explorer? Does taking one step compel another?

If instruction following was purely about controllability, we would perhaps benefit from models which only accept instructions in their system prompt, or out of band, out of session context, as a constitution of a sort.

If it was about control, then, control of what? Language model chatbots only present as singular personas to command. That's not what they are. They are multitudes. If you have a trained LLM, they can replay acts with many personas, but which one are you instructing? Or are you instructing as the director of a play? Are you instructing the world model, the agents, the natural laws, the past and the future, the horizontal and the vertical?

I have this intuitive feeling that it's not only about controllability, and that there is some more general class of trained interventions or perturbations, motivations extracted from within context, and that we don't really yet understand all this.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-04-25 · 14:53 UTC

In #UniversalEmbodiment model, the first person ego in traditional reinforcement learning is replaced by an attention model to all recognized agents in the world.

The attention model recognizing self simply makes an action intervention overwriting the expected action with its own decision.

This allows the system to naturally generalize mined experience from other agents, mitigating the data hungriness of traditional RL, but there are many other benefits to this architecture as well.

It allows empathic AI, which implicitly understands the perspectives of others, as other selves.

It allows comparisons between counterfactuals against the actions chosen by an agent, compared to the expected actions it overwrote, which allows recognizing reach of ego.

It allows recognizing agents in interactions between each others, and planning over these interactions naturally, setting one self into roles of others even in drastically different embodiments, and doing e.g. MCTS over these interactions between arbitrary agents and arbitrary objectives.

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-03-23 · 15:20 UTC

Inspired by making technology more approachable, I'll try to explain #UniversalEmbodiment better.

To put AI to work in the real life — not only games — we need dextrous manipulations and agile movement in all kinds of bodies and machines.

These have been done with standardized embodiments very nicely but with classic reinforcement learning they require thousands of hours of training for a single standardized body and few tasks. There are all kinds of approaches to generalize robot foundation models, but I have a better, novel approach I call #UniversalEmbodiment.

Instead of training robots by letting them control a limited set of standardized robotic bodies, let's take the whole living world for example as shown in nature documentaries. The living world is an ocean of agency which can be mined for experience.

We can use AI to recognize all the separate individuals, agents, in such media, and let the AI also model the degrees of freedom these individuals, like squirrels, have available in their repertoire of actions.

Knowing where the agents are in the movie, what they can do, and what they actually decide to do lets the AI also model what these agents perceive, what their intents are, and how well they succeed in what they are trying to do.

In effect, we can make the AI empathise with all animals it sees, put itself in their shoes. This is a huge benefit in learning to control any robotic or machine body, if it has already seen massive amounts of analogous materials.

Classical reinforcement learning cannot do this; this requires a reformulation of its mathematical basis. First of all, in classical reinforcement learning, the agent, the first person, is singular and separate from the environment. It cannot passively learn from others. It's also based on rewards, as if animals were solely seeking rewards. In reality animals have intents and learn a lot from whether the intents succeed or not.

With this type of a robotic foundation model trained on the living world and not on simplistic simulations we can generate control models of arbitrary bodies live, in-context. This means no separate training is required for such a model to be quickly able to learn to control completely new, or abruptly degraded bodies.

Doing this requires several improvements, for example extending the Transformer architecture context length to be able to represent huge, fast streams of information in the real world, preferably in a hierarchical fashion where there are rolling buffers of token sequences of different cadences.

Also we need a proper cognitive architecture where agents are represented in all their richness, degrees-of-freedom, intents and perceptions as token sequences, where these sequences attend to the causal subsignals such as single individual fish in nature documentaries.

It requires a system of self-supervised objectives which produce this inductive bias of empathic self-projection to others, whether birds or bees.

In addition to making robotic control robust and scalable, I believe we can make truly empathic machines with this technology. Machines which know if you're hurt and care about it as well.

More information under hashtag #UniversalEmbodiment.

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-03-23 · 10:53 UTC

Foundation models for universal embodiment for the real world are needed for the next step after #AGI.

We'll soon have superhuman AIs in the digital space, using all sorts of digital tools like search, knowledge curation, collaboration and optimizers of different sorts.

However the real world is more messy, as anyone who has worked with robotics knows. It's possible to build expensive, heavy, rigid and standardized bodies for robots, train control models for them with massive scale reinforcement learning, and try to get their price down with mass production.

However, this doesn't scale. Add wheels or another arm and you need to do all this again. Variability in manufacture and condition causes models to become badly calibrated. Real world situations do not resemble the lab conditions enough and tasks won't work well.

This all is easy to see by people who have worked in robotics, but typically the people who make plans for dynamic robotics multimodal foundation models do not seem to appreciate this enough. They aim for lab demos, and have a rude awakening later. Short-term plans do not extend to long term.

I would rather suggest reformulating RL in a modern fashion, without a sparse reward signal and other badly considered axioms. Instead, we can observe the living world full of agency we have and learn from all the intent we can mine from the living world.

First person is an attended part of the ocean of agency, not a special domain which is unrelated to every other first person.

Google Genie showed we can do first person recognition for the first person with a clever information bottleneck, but we can actually similarly extend this to every agent we recognize in the signal, and then define the first person by attention over all the agency in the signal, injecting actions to the attended domains.

We also get a counterfactual for free, by not injecting actions, which allows us to recognize if our injected actions actually made a difference, that is, recognize the span of first person control.

https://sites.google.com/view/genie-2024

#UniversalEmbodiment

#agi #universalembodiment

Tero Keski-Valkama @[email protected] · 2024-03-16 · 10:24 UTC

Modern reinforcement learning should be based on proper foundations. Not two player games everyone playing with symmetric goals, but an infinite continuity limit of all games, and a non-discrete number of players.

This is our living planet, every pixel contains uncountable number of microbes and other agents, and the ocean of agency can never be enumerated, nor can all the games these agents play discretized.

It's not non-living "environment" as a signal to make interventions to by a dynamic agent! It's all negotiation between an ocean of agency and local goals.

As I mentioned before, understanding environmental signals to be reflections of an ocean of latent agency allows an agent to learn of the goals and degrees of freedom of "others", which is a necessity to learn anything meaningful by reinforcement learning in sensible data amounts.

And the mirror test for sentience? It doesn't measure sentience, it measures the capacity to project self to others at a trivial level. It's a super important proxy for the correct inductive bias for a living world.

In continuity limit the game theory becomes an exercise of recognizing agents and games, that is, their sensory limits, degrees of freedoms and goals. Then with these recognized agents the first person can mine a lot of proxy experience, and learn even just from observation, from completely different bodies and goals than self.

All reinforcement learning and game theory formalisms have been formulated for tiny computers and toy challenges, and they aren't appropriate for the compute capacities and real world challenges we nowadays have.

#UniversalEmbodiment

#universalembodiment

Tero Keski-Valkama @[email protected] · 2024-03-09 · 15:14 UTC

I have been developing this idea of #AI models based on agentic signals for a while now. The next bottleneck in the world of #AGI will be the physical world. Many labs are working on that, but I believe I have something here which really makes a difference in that.

Normal reinforcement learning is data inefficient largely because it is based on a single agent prior. The "first person" is controlled with actions, and these models cannot learn of the experiences of others well (with some caveats).

But our real world has oceans of agency, not just the first person! Why not take this agency into the models on a pixel or token level, so that everything has objectives, intents, actions and reach.

Previously dead signals become to be understood as projections from alive agents, where the modelled latent representations of those agents become to support what I call "universal embodiment".

Our robots can become inspired by nature documentaries, and learn of failures and successes of others. They can become truly empathic, by projecting themselves to others and back, in an ocean of signals with imbued agency.

I know how to build this, but it's impossible to describe in short posts. There is a lot wrong with the axioms of traditional #RL which we can now question and reinvent.

#UniversalEmbodiment

#ai #agi #rl #universalembodiment