Recently, we've been considering agents that instead of just being given the problem setup have to actually learn it from experience. For example, instead of being told that an agent is a perfect predictor, they just have to play until they realise this. When an agent encounters a problem like iterated Parfit's Hitchhiker, there is a weird effect that can occur during exploration if the predictor is perfect.
We will turn Parfit's Hitchiker into an iterated problem by changing being left in the desert to a large utility penalty, rather than just dying. Suppose an agent has decided that next time it is in town that it won't pay. As long as it is committed to this, it'll never actually end up in town and so it'll never actually fulfil this commitment. It'll always be predicted to defect and so will never get the chance. We will call this stuck exploration because the exploration never resolves.
One trivial way around this is to never commit based on the next time, but instead control exploration by a pseudo-random variable based on the time. However, avoiding Stuck Exploration doesn't necessarily mean that the agent will necessarily end up understanding the situation. The agent will notice that it always pays in town; ie. that it never takes the explore option when there. An EDT agent would be able to figure out that it is likely being predicted, while a dualistic agent like AIXI wouldn't be able to directly make this connection. Of course, AIXI would notice that something weird was happening and this would distort the model of the algorithm it chooses in some way. Perhaps it would a spurious link, but we won't try to delve into this at this stage.
This post was a result of discussions with Davide Zagami and supported by the EA Hotel and AI Safety Research Program. Thanks to Pablo Moreno and Luke Miles for feedback.
Recently, we've been considering agents that instead of just being given the problem setup have to actually learn it from experience. For example, instead of being told that an agent is a perfect predictor, they just have to play until they realise this. When an agent encounters a problem like iterated Parfit's Hitchhiker, there is a weird effect that can occur during exploration if the predictor is perfect.
We will turn Parfit's Hitchiker into an iterated problem by changing being left in the desert to a large utility penalty, rather than just dying. Suppose an agent has decided that next time it is in town that it won't pay. As long as it is committed to this, it'll never actually end up in town and so it'll never actually fulfil this commitment. It'll always be predicted to defect and so will never get the chance. We will call this stuck exploration because the exploration never resolves.
One trivial way around this is to never commit based on the next time, but instead control exploration by a pseudo-random variable based on the time. However, avoiding Stuck Exploration doesn't necessarily mean that the agent will necessarily end up understanding the situation. The agent will notice that it always pays in town; ie. that it never takes the explore option when there. An EDT agent would be able to figure out that it is likely being predicted, while a dualistic agent like AIXI wouldn't be able to directly make this connection. Of course, AIXI would notice that something weird was happening and this would distort the model of the algorithm it chooses in some way. Perhaps it would a spurious link, but we won't try to delve into this at this stage.
This post was a result of discussions with Davide Zagami and supported by the EA Hotel and AI Safety Research Program. Thanks to Pablo Moreno and Luke Miles for feedback.