GPTs are being trained to predict text, not imitate humans. This task is actually harder than being human in many ways. You need to be smarter than the text generator to perfectly predict their output, and some text is the result of complex processes (e.g. scientific results, news) that even humans couldn't predict.
GPTs are solving a fundamentally different and often harder problem than just "be human-like". This means we shouldn't expect them to think like humans.
On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of yet-to-be-invented breakthrough technical alignment ideas.
On the other side of this debate is almost everyone who works on or studies LLMs. Some of them are very concerned about egregious scheming, others much less so, and as a group they’re equally or more concerned about lots of other potential AI problems—AI-assisted bioterrorism, AI-assisted dictatorships, etc. And if they’re concerned about egregious misalignment and scheming, they’ll often say that it would come about through being in too much of a rush, or careless programmers, or bad actors, etc., as opposed to the simpler...
For those who disagree-voted: I want to understand why you disagree. Presumably it's with the parenthetical. Is it just that you're less confident in current Claude's generalization behavior? Or that you actively expect it to be malign? Maybe you're picturing some sort of idealized reflection process that I'm not?
Call for alpha testers for an AI control/security tool. A ton of alignment researchers YOLO their Claude usage right now. We run Claude on our computers without real protection (perhaps beyond auto mode) but there isn't an easy way to comply with known best practices. I wrote claude-guard, a wrapper to make best practices easy: just install and then your future claude sessions are protected.
Smart misaligned AI will target alignment researchers in particular for research sabotage, for example by:
claude-gI feel very confused and uncertain so keep your expectations low for the quality of this comment.
In the framing of the post, I think much (most?) of the disagreement is downstream of whether we'll even choose to pursue the kind of ASI for which the theoretical arguments dominate the prosaic LLM-style safety arguments. LLMs or other non-limits-of-intelligence technologies with better safety properties could very plausibly scale far enough to satisfy the wants of people developing AI and/or end competitive pressures to build more ASI-like things.