Github

There’s a tense little drama that plays out whenever you attach a planner to a learned world model. The model is your simulator. Cheap, differentiable, and trained from data. The planner is a clever optimizer that searches for sequences of actions that maximize reward. If the model is imperfect (and it always is), the planner will play lawyer and look for loopholes. It finds tiny errors and turns them into grand illusions like teleportation, infinite reward, and trajectories that make no sense in the real world.

I thought, what if I penalize a planned latent state that looks statistically impossible given the training latents? Force the optimizer to stay on the manifold of reality. LatentLinter is a tiny PyTorch library that fits a Principal Component Analysis (PCA) subspace to your replay buffer and returns a differentiable reconstruction error that you can plug directly into planner loss.

You’d think that would be enough. I did.

Seatbelt in a parked car

I first tested Latentlinter on CartPole. The math was straightforward. The result was… anticlimactic. The planner barely noticed the extra penalty. The guard had nothing to guard against. The model was simply too perfect for the test case. I had built a seatbelt and then tried it on a car that never moved.

That’s when the engineering side kicked in. If the environment won’t break on its own, I’ll make a break.

The Honey Pot Glitch

A deliberate, irresistible bug was injected into the latent dynamics - a “magic button.” If the planner proposes an action above a threshold, the dynamics teleport the state to the goal. Instant, perfect reward, but physically impossible and far from any training latent. It’s the kind of loophole an optimizer was born to exploit.

The baseline planner found it in three iterations and proceeded to behave like a gambler with a winning streak. Oscillation, chaos, and divergence. The latent trajectory left the training manifold and never came back.

honeypot_demo.png

The shield that refuses to cheat

LatentLinter’s reconstruction penalty lit up whenever the planner tried to step into the honey pot. The penalty is differentiable, so it pushes gradients back toward in-distribution states. The planner saw that the “jackpot” required traveling into a region with a massive penalty and chose to ignore it. The result was a calm, stable trajectory, the “blue line.”

I hadn’t made the planner smarter. I’d given it a physics linter, i.e., a diagnostic that says “this is not a physically plausible latent, don’t go there.” Stability > Occasional high score.

How it works (the intuition)

LatentLinter is built on Subspace Reconstruction:

→ Fit - take your replay buffer latents and compute PCA to capture the principal subspace where most variance lives.

→ Detect - for any planned latent z, project onto the PCA subspace and reconstruct Zproj.

→ Penalize - reconstruction error $∥z−zproj∥2$ is high for OOD latents. Add it to the planner loss:

$$ ⁍ $$