Shiroe - AI Alignment Forum

Reading AI safety articles like this one, I always find myself nodding along in agreement. The conclusions simply follow from the premises, and the premises are so reasonable. Yet by the end, I always feel futility and frustration. Anyone who wanted to argue that AI safety was a hopeless program wouldn't need to look any further than the AI safety literature! I'm not just referring to "death with dignity". What fills me with dread and despair is paragraphs like this:

However, optimists often take a very empiricist frame, so they are likely to be interested in what kind of ML experiments or observations about ML models might change my mind, as opposed to what kinds of arguments might change my mind. I agree it would be extremely valuable to understand what we could concretely observe that would constitute major evidence against this view. But unfortunately, it’s difficult to describe simple and realistic near-term empirical experiments that would change my beliefs very much, because models today don’t have the creativity and situational awareness to play the training game. [original emphasis]

Here is the real chasm between the AI safety movement and the ML industry/academia. One field is entirely driven by experimental results; the other is dominated so totally by theory that its own practitioners deny that there can be any meaningful empirical aspect to it, at least, not until the moment when it's too late to make any difference.

Years ago, I read an article about an RL agent wireheading itself via memory corruption, thereby ignoring its intended task. Either this article exists and I can't find it now, or I'm misremembering. Either way, it's exactly the sort of research that the AI safety community should be conducting and publishing right now (i.e. propaganda with epistemic benefits). With things like GPT-3 around nowadays, I bet one could even devise experiments where artificial agents learn to actually deceive humans (via Mechanical Turk, perhaps?). Imagine how much attention such an experiment could generate once journalists pick it up!

EDIT: This post is very close to what I have in mind.

Two-year update on my personal AI timelines

Shiroe3y20

You mention that you're surprised to have not seen "more vigorous commercialization of language models" recently beyond mere "novelty". Can you say more about what particular applications you had in mind? Also, do you consider AI companionship as useful or merely novel?

I expect the first killer app that goes mainstream will mark the PONR, i.e. the final test of whether the market prefers capabilities or safety.

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Shiroe3y*1126

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments