How can we combine behavioural experiments with mechanistic interpretability to infer an agent’s subjective causal model? The next post will say more about this.
There is no next post. Can I read about it somewhere anyway?
It's hard to guess, but it happened when the only one known to us general intelligence was created by a hill-climbing process.
There is no next post. Can I read about it somewhere anyway?