Don't Share Information Exfohazardous on Others' AI-Risk Models

Hence, the policy should have an escape clause: You should feel free to talk about the potential exfohazard if your knowledge of it isn't exclusively caused by other alignment researchers telling you of it. That is, if you already knew of the potential exfohazard, or if your own research later led you to discover it.

In an ideal world, it's good to relax this clause in some way, from a binary to a spectrum. For example: if someone tells me of a hazard that I'm confident I would've discovered one my own one week later, then they only get to dictate me not-sharing-it for a week. "Knowing" isn't a strict binary; anyone can rederive anything with enough time (maybe) — it's just a question of how long it would've taken me to find it if they didn't tell me. This can even include someone bringing my attention to something I already knew, but to which I wouldn't as quickly have thought to pay attention if they didn't bring attention to it.

In the non-ideal world we inhabit, however, it's unclear how fraught it is to use such considerations.

^{^}

Obviously there's the issue of your directly using the exfohazard to e. g. accelerate your own AI research.

Or the knowledge of it semi-consciously influencing you to follow some research direction that leads to your re-deriving it, which ends up making you think that your knowledge of it is now independent of the other researcher having shared it with you; while actually, it's not independent. So if you share it, thinking the escape clause applies, they will have made a mistake (from their perspective) by telling you.

Still, mostly safe.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

25

Don't Share Information Exfohazardous on Others' AI-Risk Models

25