Okay, that makes sense (and seems compelling, though not decisive, to me). I'm happy to leave it here - thanks for the answers!
It seems like you should either run separate models for D and D*, or jointly train the model on both D and D*, definitely you shouldn't train on D then run on D* (and you don't need to!).
Sorry yes, you're completely right. (I previously didn't like that there's a model trained on which only gets used for finding z*, but realized it's not a big deal.)
The goal is to be as good as an unaligned ML system though, not just to be better than humans. And the ML system updates on D, so we need to update on D too.
I agree - I mean for the...
Thanks. This is helpful. I agree that LTP with the distilled core assumption buys us a lot, both theoretically and probably in practice too.
> The distilled core assumption seems right to me because the neural network weights are already a distilled representation of D, and we only need to compete with that representation... My main reservation is that this seems really hard... If we require competitiveness then it seems like z has to look quite a lot like the weights of a neural network
Great, agreed with all of this.
> In writing the original post I w...
I'm trying to get a better handle on what the benefits coming from LTP are. Here's my current picture - are there points here where I've misundersood?
_________
The core problem: We have a training distribution (x, y) ~ D and a deployment distribution (x*, y*) ~ D*, where D != D*. We would rather not rely on ML OOD generalization from D to D*. Instead, we would rather have a human label D*, train an ML model on those labels, and only rely on IID generalization. Suppose D is too large for a human to process. If the human knows how to label D* without learning...
The type 1 vs. type 2 feedback distinction here seems really central. I'm interested if this seems like a fair characterization to both of you.
Type 1: Feedback which we use for training (via gradient descent)
Type 2: Feedback which we use to decide whether to deploy trained agent.
(There's a bit of gray between Type 1 and 2, since choosing whether to deploy is another form of selection, but I'm assuming we're okay stating that gradient descent and model selection operate in qualitatively distinct regimes.)
The key disagreement is whether we expect type ... (read more)