BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward