Physics and World Model


How Far is Video Generation from World Model: A Physical Law Perspective

How Far is Video Generation from World Model: A Physical Law Perspective

phyre_demo_2x4.mp4

Model #Param
DiT-S 22.5M
DiT-B 89.5M
DiT-L 310.0M
DiT-XL 456.0M

Figure 3: The error in the velocity of balls between the ground truth state in the simulator and the values parsed from the generated video by the diffusion model, given the first 3 frames

Figure 3: The error in the velocity of balls between the ground truth state in the simulator and the values parsed from the generated video by the diffusion model, given the first 3 frames

They also trained DiT-XL on the uniform motion 3M dataset but observed no improvement in OOD generalization.

In general, by simply training model on video, does not give a world model that can understand physics.

Physics related World Model


Physical Informed Driving World Model

The main framework below:

x2 (1).png

PINN for vehicle dynamics modelling


Deep Dynamics: Vehicle Dynamics Modeling with a...