Needed: AI infohazard policy

One definition of manipulation is intentional and covert influence. Content recommenders can satisfy this definition, as they are typically trained to influence the user by any means, including “covert” ones like appealing to the user’s biases and emotions.

I don't think that "covert" is a coherent thing an (e.g.) content recommender could optimize against. For example, everything could appeal to the biases and emotions of the wrong person. Anything can be rude/triggering/bias-inducing to the right person. In which case, how do you classify what is covert and what isn't in a way that isn't entirely subjective and also isn't behest to (arbitrary) social norms?

I still think it's possible to define manipulation ~objectively though, but in terms of infiltration across human Markov blankets.

Reply

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Chipmonk2y00

For the deontic feasibility hypothesis, do you have any expectation for whether formalizing the moral desiderata (specifically: «boundaries») will or should ultimately be done by 1) humans; or 2) automated AI alignment assistants? @davidad

Reply