Utkarsh's Notes

On Verifiable Rewards

Disclaimer: (This is still a spitball-in-progress)

I've noticed that when people want to show the idea of determination, discipline, and the myriad of traits that represent the pinnacle of human achievement, they're all mostly in a very sandboxed environment. From playing an instrument to competing in sports, these are all events that have very discrete and clear rewards. It is (relatively) easier to gain feedback on your actions and how to get better at your goal.

Not to knock on these fields as something easier, but I think it's an inappropriate comparison when we try to port it over in comparison to our own lives. Most of the actions we take rarely, if ever, have such clear reward signals. What major you study, what friends you have, what work you do, these rarely map out to discrete outcomes. They often also have rewards that can't even be compared against each other, owing to the enormous complexity of the task itself. But we still try our best to replicate what works in the sandboxed environments here, hoping we see the same outcomes.

(Something something how this isn't the best idea, its prone to failure, etc etc)

But this still doesn't address the fundamental question of how we resolve what the correct action to take is, when the reward signal is so unclear & sparse.

(I'm still thinking on what one should do in these cases, maybe I need to read more RL literature. If you have any ideas, feel free to DM me, I shall credit you in the essay!)