Wei Xiong@UIUC

Hanze Dong@Salesforce

Rui Yang@HKUST

Date: Mar 23, 2024

To Readers:

TL; DR

This is the recipe for the GitHub repo used to train the reward model for RLHF.

1. Introduction

Reinforcement learning from human feedback (RLHF) is a leading technique to adapt the generation distribution to be preferred by human and has achieved tremendous success in ChatGPT by OpenAI, Claude by Anthropic, and Gemini by Google.

The most standard presented in the Instruct-GPT consists of three steps: