World Models with known uncertainty
Combine Deep Equilibrium Models with Incremental RNNs
Use a pre-trained language model to somehow train a smaller compositional model
Train a discriminator to say if 2 images are the same and use it as the energy signal for a decoder on out-of-distribution images
Equilibrium Capsule Networks
Visio-Linguistic Model should be trained with conterfactual (?) / complementary pairs