Simulators for grinds in fallen london
Simple conceptual implementation of reinforcement learning from human preferences.
Simulate distributed transformer training runtimes
No description provided.
Learn a Bayesian Posterior distribution over reward functions using human feedback of different kinds