Obviously I think it's worth being careful, but I think in general it's actually relatively hard to accidentally advance capabilities too much by working specifically on alignment. Some reasons:
I think the alignment community thinking correctly is essential for solving alignment. Especially because we will have very limited empirical evidence before AGI, and that evidence will not be obviously directly applicable without some associated abstract argument, any trustworthy alignment solution has to route through the community reasoning sanely.
Also to be clear I think the "advancing capabilities is actually good because it gives us more information on what AGI will look like" take is very bad and I am not defending it. The arguments I made above don't apply, because they basically hinge on work on alignment not actually advancing capabilities.
From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.
If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.
If we communicate the simple idea that AGI is not near then people will feel less need to work on safety soon. This would let them not miss out on opportunities that would be good to take ahead of when they actually need to focus on AI safety.
We can only really communicate one thing at a time to people. Also, we should worry more about tail risks a false positives (thinking we can build AGI safely when we cannot) than false negatives (thinking we can't build AGI safely when we can). Taking these two facts into consideration, I think the policy implication is clear: unless there is extremely strong evidence that AGI is not near, we must act and communicate as if AGI is near.
I reached this via Joachim pointing it out as an example of someone urging epistemic defection around AI alignment, and I have to agree with him there. I think the higher difficulty posed by communicating "we think there's a substantial probability that AGI happens in the next 10 years" vs "AGI is near" is worth it even from a PR perspective, because pretending you know the day and the hour smells like bullshit to the most important people who need convincing that AI alignment is nontrivial.
IME a lot of people's stated reasons for thinking AGI is near involve mistaken reasoning and those mistakes can be discussed without revealing capabilities ideas: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce
An alternative framing that might be useful: What do you see as the main bottleneck for people having better predictions of timelines (as you see it)?
Do you in fact think that having such a list is it?
I occasionally have some thoughts about why AGI might not be as near as a lot of people seem to think, but I'm confused about how/whether to talk about them in public.
The biggest reason for not talking about them is that one person's "here is a list of capabilities that I think an AGI would need to have, that I don't see there being progress on" is another person's "here's a roadmap of AGI capabilities that we should do focused research on". Any articulation of missing capabilities that is clear enough to be convincing, seems also clear enough to get people thinking about how to achieve those capabilities.
At the same time, the community thinking that AGI is closer than it really is (if that's indeed the case) has numerous costs, including at least:
Having a better model of what exactly is missing could conceivably also make it easier to predict when AGI will actually be near. But I'm not sure to what extent this is actually the case, since the development of core AGI competencies feels more of a question of insight than grind[1], and insight seems very hard to predict.
A benefit from this that does seem more plausible would be if the analysis of capabilities gave us information that we could use to figure out what a good future landscape would look like. For example, suppose that we aren't likely to get AGI soon and that the capabilities we currently have will create a society that looks more like the one described in Comprehensive AI Services, and that such services could safely be used to detect signs of actually dangerous AGIs. If this was the case, then it would be important to know that we may want to accelerate the deployment of technologies that are taking in the world in a CAIS-like direction, and possibly e.g. promote rather than oppose things like open source LLMs.
One argument would be that if AGI really isn't near, then that's going to be obvious pretty soon, and it's unlikely that my arguments in particular for this would be all that unique - someone else would be likely to make them soon anyway. But I think this argument cuts both ways - if someone else is likely to make the same arguments soon anyway, then there's also limited benefit in writing them up. (Of course, if it saves people from significant mental anguish, even just making those arguments slightly earlier seems good, so overall this argument seems like it's weakly in favor of writing up the arguments.)
From Armstrong & Sotala (2012):