Amnonian

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Reward is not the optimization target

Amnonian3y3-2

I'm feeling confused.

It might just be my inexperience with reinforcement learning, but while I agree with what you say, I can't square it with my intuition of what a ML model does.

If our model uses some variant of gradient ascent, it will end up in high reward function values. (Not necessarily in any global/local maxima, but the attempt is to get it to some such maxima.) In that sense the model does optimize for reward.

Is that a special attribute of gradient ascent, that we shouldn't expect other models to have? Does that mean that gradient ascent models are more dangerous? Are you just noting that the model won't necessarily find the global maxima, and only reach some local maxima?

Reply