User Comment Replies — AI Alignment Forum

Let me see if I am on the right page here.

Suppose I have some world state S, a transition function T : S → S, actions Action : S → S, and a surjective Camera : S -> CameraState. Since Camera is (very) surjective, seeing a particular camera image with happy people does not imply a happy world state, because many other situations involving nanobots or camera manipulation could have created that image.

This is important because I only have a human evaluation function H : S → Boolean, not on CameraState directly.
When I look at the image with the fake h... (read more)

ARC's first technical report: Eliciting Latent Knowledge

scottviteri3y40

1Paul Christiano3y

Everything seems right except I didn't follow the definition of the regularizer. What is L2? This is what we want to do, and intuitively you ought to be able to back out info about the hidden state, but it's not clear how to do so. All of our strategies involve introducing some extra structure, the human's model, with state space SH, where the map CameraH:SH→CameraState also throws out a lot of information. The setup you describe is very similar to the way it is presented in Ontological crises. ETA: also we imagine H:SH→CameraState, i.e. the underlying state space may also be different. I'm not sure any of the state mismatches matters much unless you start considering approaches to the problem that actually exploit structure of the hidden space used within M though.

AI ALIGNMENT FORUM
AF

All of scottviteri's Comments + Replies