Physics and World Model

How Far is Video Generation from World Model: A Physical Law Perspective

Model	#Param
DiT-S	22.5M
DiT-B	89.5M
DiT-L	310.0M
DiT-XL	456.0M

Figure 3: The error in the velocity of balls between the ground truth state in the simulator and the values parsed from the generated video by the diffusion model, given the first 3 frames

They also trained DiT-XL on the uniform motion 3M dataset but observed no improvement in OOD generalization.

In general, by simply training model on video, does not give a world model that can understand physics.

Physics related World Model

Physical Informed Driving World Model

The main framework below:

x2 (1).png

PINN for vehicle dynamics modelling

Deep Dynamics: Vehicle Dynamics Modeling with a...