Tapatakt

Posts

Sorted by New

1Tapatakt's Shortform

1y

0

Wikitag Contributions

Comments

Sorted by

Newest

Reward Hacking from a Causal Perspective

Tapatakt1y10

How can we combine behavioural experiments with mechanistic interpretability to infer an agent’s subjective causal model? The next post will say more about this.

There is no next post. Can I read about it somewhere anyway?

Reply

AGI Ruin: A List of Lethalities

Tapatakt1y-1-3

It's hard to guess, but it happened when the only one known to us general intelligence was created by a hill-climbing process.

Reply