How can we combine behavioural experiments with mechanistic interpretability to infer an agent’s subjective causal model? The next post will say more about this.
There is no next post. Can I read about it somewhere anyway?
I think it's inappropriate to call evolution a "hill-climbing process" in this context, since those words seem optimized to sneak in parallels to SGD. Separately, I think that evolution is a bad analogy for AGI training.
There is no next post. Can I read about it somewhere anyway?