baby-rlhf
Public

Simple conceptual implementation of reinforcement learning from human preferences.

Loading repository data...