Lessons from 20+ years of software security experience, perhaps relevant to AGI alignment:
1. Security doesn't happen by accident
2. Blacklists are useless but make them anyway
3. You get what you pay for (incentives matter)
4. Assurance requires formal proofs, which are provably impossible
5. A breach IS an existential risk
I feel very confused and uncertain so keep your expectations low for the quality of this comment.
How many years will pass before transformative AI is built? Three people who have thought about this question a lot are Ajeya Cotra from Open Philanthropy, Daniel Kokotajlo from OpenAI and Ege Erdil from Epoch. Despite each spending at least hundreds of hours investigating this question, they still still disagree substantially about the relevant timescales. For instance, here are their median timelines for one operationalization of transformative AI:
| Median Estimate for when 99% of currently fully remote jobs will be automatable | |
|---|---|
| Daniel | 4 years |
| Ajeya | 13 years |
| Ege | 40 years |
You can see the strength of their disagreements in the graphs below, where they give very different probability distributions over two questions relating to AGI development (note that these graphs are very rough and are only intended to capture high-level differences, and especially aren't very...
Yep! Thanks.
Note that the scenario I gave wasn't actually a prediction, or at least, it wasn't my median world. I said elsewhere in thread that my median was 2027 for AGI, and implied that my median for ASI was more like 27/28:
...To be clear, my view is that we'll achieve AGI around 2027, ASI within a year of that, and then some sort of crazy robot-powered self-replicating economy within, say, three years of that. So 1000x energy consumption around then or shortly thereafter (depends on the doubling time of the crazy superintelligence-designed-and-managed rob
On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of yet-to-be-invented breakthrough technical alignment ideas.
On the other side of this debate is almost everyone who works on or studies LLMs. Some of them are very concerned about egregious scheming, others much less so, and as a group they’re equally or more concerned about lots of other potential AI problems—AI-assisted bioterrorism, AI-assisted dictatorships, etc. And if they’re concerned about egregious misalignment and scheming, they’ll probably say that it would come about through race dynamics, careless programmers, bad actors, etc., as opposed to the simpler Yudkowsky & Soares story of “we get...
RE 1, sure, “LLM will invent non-LLM ASI” is possible in principle, and would be a special case of “LLMs do not scale to ASI”. I do mention that (in the “Yudkowsky & Soares’s position [caricatured]” section).
RE 2, he wrote that “current AIs seem pretty misaligned”, not that current AIs are egregiously misaligned, scheming, and ruthless. I obviously do not think we should extrapolate from empirical observation of today’s LLMs to future ASI, but if I DID so extrapolate, I think my attitude would be vaguely like “eh, maybe future ASI will be egregious mis...
Is the widespread false belief that LLMs are AGI-complete going to kill us all?
(Prompted by this post by @Steven Byrnes, with which I basically agree.)
Suppose we live in a world in which LLMs don't scale to AGI.[1] If so, I think it's fairly plausible that LLM-alignment-optimists are correct (on the narrow question of LLM alignment). That is: all the technical arguments and empirical evidence in favor of LLMs being relatively easy to align/control, even in the limit of LLM capabilities, are valid, and the doomsaying about LLMs is wrong.
After all, most of t...
ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge, a contest to improve upon our estimation algorithms for random MLPs. The warm-up round begins this week, and later rounds will have a total prize pool of at least $100,000.
We are very grateful to Sharada Mohanty, Sneha Nanavati, Dipam Chakraborty and everyone else at AIcrowd for working with us to host this contest, as well as to Paul Rosu for testing the contest and to Harshita Khera for operational support.
Our challenge follows the same setup as our recent paper on wide random MLPs: we consider MLPs
where the activation function
To begin with, we are fixing the width
Okay another thing (let me know if there's a better spot for reports like this?), the "symmetrize" functionality seems unavailable on the grader. For that matter, the grader doesn't have stock numpy and some other deps that are included in the starterkit env. See https://www.aicrowd.com/challenges/arc-white-box-estimation-challenge-2026/submissions/310565 re symmetrize not being there.