Dave Jacob - AI Alignment Forum

I guess it is just my lack of understanding (? ? ?), but - as far as I think I understand it - my own submission is actually hardly different (at least in terms of how it goes around the counter-examples we knew so far) from the Train a reporter that is useful to an auxiliary AI-proposal.

My idea was to simply make the reporter useful for (or rather: a necessarily clearly and honestly communicating part of) our original smart-vault-AI (instead of any auxiliary-AI), by enforcing a structure of the overal smart-vault-AI where its predictor can only communicate what to do to its "acting-on-the-world"-part by using this reporter.

Additionally, I would have enforced that there is not just one such reporter but a randomized row of them, so as to make sure that by having several different of them basically play "chinese-whispers", they have a harder time of converging on the usage of some kind of hidden code within their human-style communication.

I assume the issue with my proposal is that the only thing I explained about why those reporters would communicate in an understandable-for-humans-way in the first place was that this would simply be enforced by only using reporters whose output consists of human concepts + in between each training-step of the chinese-whisper-game, they would also be filtered out if they stopped using human concepts as their output.

My counter-example also seems similar to me than those mentioned under Train a reporter that is useful to an auxiliary AI-proposal.:

As mentioned above, the AI might simply use our language in another way than it is actually intended to be used, by hiding codes within it etc.

I am just posting this to get some feedback on where I went wrong - or why my proposal is simply not useful, apparently.

(Link to my original submission:) https://docs.google.com/document/d/1oDpzZgUNM_NXYWY9I9zFNJg110ZPytFfN59dKFomAAQ/edit?usp=sharing

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments