Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Sure, here are some things:
I can probably think of some more.
(For what it's worth, it appears to me that people started using the term "scheming" in much more confusing and inconsistent ways after this post was written and tried to give that term a technical meaning. I currently think this was quite bad. I do like a lot of the content of the paper/essay/post. I have like one conversation every two weeks that ends up derailed or confused because the two participants are using "scheming" in different specific ways, assuming the other person has the same meaning in mind)
How are the triangle numbers not quadratic?
Sure looks quadratic to me.
This essay seems to have lost the plot of where the problems with AI come from. I was historically happy that Conjecture focused on the parts of AI development that are really obviously bad, like having a decent chance of literally killing everyone or permanently disempowering humanity, but instead this seems like it's a random rant against AI-generated art, and name-calling of obviously valuable tools like AI coding assistants .
I am not sure what happened. I hope you find the plot again.
(Edit note: I fixed up some formatting that looked a bit broken or a bit confusing. Mostly replacing some manual empty lines with "*" characters with some of our proper horizontal rule elements, and removing italics from the executive summary, since our font is kind of unreadable if you have whole paragraphs of italicized text. Feel free to revert)
I really think if you want to tell a story of AI Control work being good (especially compared to working on preventing AGI from being built in the first place), the important and difficult part is figuring out how to actually use these more powerful AI systems to either achieve some kind of global moratorium, or make unprecedented progress on the hard parts of the AI Alignment problem.
When I see most people start thinking about control, I rarely see them interface with either of these two things, and honestly, I mostly see them come up with cool additional commercial applications of AI, which I think will mostly make the present situation worse. To be clear, I think thinking about how to use AI systems for global coordination or making progress on the hard parts of the AI Alignment problem is tractable, but I haven't seen much work on it, and I am worried it is not the default thing that people will work on when they start thinking about control.
I think it is unlikely for control work to buy humanity much time until someone builds a very powerful unaligned AI system, at least at our present levels of coordination tools. I think controlled AI systems might help with coordination here, but I do think someone needs to do the actual hard work of thinking about how it might.
(I go into this and various related things in my dialogue with Ryan on control)
I agree with this in principle, but contrasting "actual" with "plausible", combined with the fact that it talked about this in the context of not having internet access, makes me feel reasonably confident this is pointed at "not an actual link", but I agree that it's not an ironclad case.
I agree that o1 might not be able to tell whether the link is fake, but the chain of thought does say explicitly:
So, the assistant should [...] provide actual or plausible links.
The "plausible" here suggests that at least in its CoT, it has realized that the task would have probably been considered completed accurately in training as long as the links are plausible, even if they are not actual links.
I think the core argument is "if you want to slow down, or somehow impose restrictions on AI research and deployment, you need some way of defining thresholds. Also, most policymaker's cruxes appear to be that AI will not be a big deal, but if they thought it was going to be a big deal they would totally want to regulate it much more. Therefore, having policy proposals that can use future eval results as a triggering mechanism is politically more feasible, and also, epistemically helpful since it allows people who do think it will be a big deal to establish a track record".
I find these arguments reasonably compelling, FWIW.