Found some information from norm stats from Pi-0 codebase which I think is interesting and you need the right norm_stats for the model to move correctly. Norm_stats is also calculated based on the dataset that is trained on. So in our case, openVLA has a lot of norm_stats including (bridge_orig, etc) but does not have libero_object, libero_spatial etc. This might be one of the reasons why we cannot evaluate base model on libero task suites.
One more thing is that we can calculate norm_stats without fine-tuning or training the model. So if I can get norm_stats without fine-tuning, we can evaluate the base-model on almost any environment without fine-tuning (it will definitely improve the movements).
https://github.com/Physical-Intelligence/openpi/blob/main/docs/norm_stats.md
Here I have attached videos/rollouts with each caption.
OpenVLA fine-tuned on Libero Object dataset and model.norm_stats set to libero_object
OpenVLA fine-tuned on Libero Object dataset and model.norm_stats set to libero_object
OpenVLA base model (non fine-tuned) after manually setting model.norm_stats = “libero_object” which I extracted from openvla/openvla-7b-libero-object. This is one rare event, most of the time it moves towards the object specified but fails to grab it.
OpenVLA based model without setting the norm_stats( Note: base model does not have norm_stats for libero_object)
OpenVLA base model after setting model.norm_stats = “bridge_orig”. I assume bridge_orig is one of the dataset used to train the base model.
As we can see, the movement significantly improves and there was a success without fine-tuning.(somehow the evaluator did not register this as success) so success rate from the run is 0%, I found the above video by manually checking them).
This is the norm_stats I extracted from fined-tuned model. According to Chatgpt and openpi codebase, it is possible to calculate norm_stats without fine-tuning.