Wiki Contributions

Comments

Sorted by
jacek30

Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert  (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.