What?

Augmenting the model in offline MBRL for zero-shot generalisation.

Why?

Everyone can do offline MBRL now, but can you do ZERO-SHOT transfer from offline MBRL?

How?

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/0b6ebb82-6461-4811-8e4f-d33965eeb8a7/Untitled.png

In two words, we want to augment a model when training and train a policy doing the augmentation as well.

The authors consider three different augmentations, but eventually, they use one (DAS):

$$ \mathcal{T}_z: (s, a, r, s') \rightarrow (s,a,r,s+ z \odot (s'-s)). $$

The authors provide a nice algorithm listing, which I convert here to my typical pseudosciencecode below.

def train(D, penalty, horizon, bsize, augmentation, epochs):
	models_ensemble = init_models()
	policy = init_policy()
	buffer = []
	for epoch in range(epochs):
		upd_data = D.sample(bsize)
		rollout(policy, buffer, lambda, horizon)
		train_policy(policy, D.union(buffer), augmentation)
	return policy

A policy now takes a concatenation of the augmentation vector together with the state.

Now, for testing we do not have a proper augmentation vector, and the authors run a regression on the current rollout data and give the predicted context to the policy. In particular, they learn a forward model to predict the change in the state $\delta_t = s_{t+1}-s_t$ and get the augmentation vector $\hat{z}_t = {\delta_t}/ {\hat{\delta}_t}$.

And?