Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert η (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.
Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert η (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.