SimplerEnv:
['google_robot_pick_coke_can', 'google_robot_pick_horizontal_coke_can', 'google_robot_pick_vertical_coke_can', 'google_robot_pick_standing_coke_can', 'google_robot_pick_object', 'google_robot_move_near_v0', 'google_robot_move_near_v1', 'google_robot_move_near', 'google_robot_open_drawer', 'google_robot_open_top_drawer', 'google_robot_open_middle_drawer', 'google_robot_open_bottom_drawer', 'google_robot_close_drawer', 'google_robot_close_top_drawer', 'google_robot_close_middle_drawer', 'google_robot_close_bottom_drawer', 'google_robot_place_in_closed_drawer', 'google_robot_place_in_closed_top_drawer', 'google_robot_place_in_closed_middle_drawer', 'google_robot_place_in_closed_bottom_drawer', 'google_robot_place_apple_in_closed_top_drawer', 'widowx_spoon_on_towel', 'widowx_carrot_on_plate', 'widowx_stack_cube', 'widowx_put_eggplant_in_basket']
Can we mitigate the error accumulation in the long-horizon planning / delicate robot manipulation methods (eg. in video planning) through modeling the temporal and multi-modal correlations with a compositional energy landscape?
Train different energy functions to model the local and global planning objectives and correlations, including but not limited to:
The energy landscapes can be trained jointly or separatively.
The trained compositional landscape can be the oracle to guide trajectory sampling&recovery / searching / RL methods for long-horizon planning.
