In Learning to Predict Without Looking Ahead: World Models Without Forward Prediction (blogpost here) they train an agent that randomly uses the input signal (pixels) or it's internal latent state to select the next action. They DO NOT explicitly give the training signal that it's internal state has to describe the world, but since sometimes it can only rely on this state, it naturally discovers that to perform well this state has to be a good representation of the world.
What if instead of randomly changing the signal, we LET THE MODEL DECIDE when it can rely on it's internal state or it has to look at the real input signal. This could provide the benefit of developing something like an uncertainty the model has of the world.
Why do we need this uncertainty?
Uncertainty has been recently used as an input signal for self-supervised methods, for example, in Planning to Explore via Self-Supervised World Models they use an exploration pre-training of the agent, using the uncertainty of it's world model to go to unknown states, training on them, thus improving it's world model and then being able to learn supervised tasks faster.
To measure the uncertainty they use an ensemble of world models and use their disagreement to quantify the uncertainty. It may be beneficial if instead we use a single model that has something that resembles the uncertainty it has of the world.
Code for Learning to Predict Without Looking Ahead: World Models Without Forward Prediction