Vision-Language Models for Spatial and Compositional Reasoning
Self Supervised Learning in Vision
Cosmos by Carl Sagan!
CLIP