User Comment Replies — AI Alignment Forum

1Linda Linsefors2y

Similar but not exactly. I mean that you take some known distribution (the training distribution) as a starting point. But when sampling actions you do so from shifted on truncated distribution to favour higher reward policies. The in the decision transformers I linked, AI is playing a variety of different games, where the programmers might not know what a good future reward value would be. So they let the system AI predict the future reward, but with the distribution shifted towards higher rewards. I discussed this a bit more after posting the above comment, and there is something I want to add about the comparison. In quantilizers if you know the probability of DOOM from the base distribution, you get an upper bound on DOOM for the quantaizer. This is not the case for type of probability shift used for the linked decision transformer. DOOM = Unforeseen catastrophic outcome. Would not be labelled as very bad by the AI's reward function but is in reality VERY BAD.

AI ALIGNMENT FORUM
AF

All of Paul Bricman's Comments + Replies