Forget OOD for a minute; ERM can't even learn to avoid spurious correlations that have counterexamples in the training data. Datasets like Waterbirds (used in that previously linked paper) are good toy datasets for figuring this out. I think we need to solve this problem first before trying to figure out OOD generalization.
Forget OOD for a minute; ERM can't even learn to avoid spurious correlations that have counterexamples in the training data. Datasets like Waterbirds (used in that previously linked paper) are good toy datasets for figuring this out. I think we need to solve this problem first before trying to figure out OOD generalization.