User Comment Replies — AI Alignment Forum

4y50

Thanks Rohin! Agree with and appreciate the summary as I mentioned before.

I don’t agree with motivation 1 as much: if I wanted to improve AI timeline forecasts, there are a lot of other aspects I would investigate first. (Specifically, I’d improve estimates of inputs into <@this report@>(@Draft report on AI timelines@).) Part of this is that I am less uncertain than the author about the cruxes that transparency could help with, and so see less value in investigating them further.

I'm curious: does this mean that you're on board with the as... (read more)

5Rohin Shah4y

Yes, with the exception that I don't know if compute will be the bottleneck (that is my best guess; I think Ajeya's report makes a good case for it; but I could see it being other factors as well). I think the case for is basically "we see a bunch of very predictable performance lines; seems like they'll continue to go up". But more importantly I don't know of any compelling counterpoints; the usual argument seems to be "but we don't see any causal reasoning / abstraction / <insert property here> yet", which I think is perfectly compatible with the scaling hypothesis (see e.g. this comment). I see, that makes sense, and I think it does make sense as an intuition pump for what the "ML paradigm" is trying to do (though as you sort of mentioned I don't expect that we can just do the motivation / cognition decomposition). Definitely depends on how powerful you're expecting the AI system to be. It seems like if you want to make the argument that AI will go well by default, you need the research accelerator to be quite powerful (or you have to combine with some argument like "AI alignment will be easy to solve"). I don't think papers, books, etc are a "relatively well-defined training set". They're a good source of knowledge, but if you imitate papers and books, you get a research accelerator that is limited by the capabilities of human scientists (well, actually much more limited, since it can't run experiments). They might be a good source of pretraining data, but there would still be a lot of work to do to get a very powerful research accelerator. Fwiw I'm not convinced that we avoid catastrophic deception either, but my thoughts here are pretty nebulous and I think that "we don't know of a path to catastrophic deception" is a defensible position.

Transparency and AGI safety

jylin04

4y20

Thanks a lot for all the effort you put into this post! I don't agree with anything, but reading and commenting it was very stimulating, and probably useful for my own research.

Likewise, thanks for taking the time to write such a long comment! And hoping that's a typo in the second sentence :)

I'm quite curious about why you wrote this post. If it's for convincing researchers in AI Safety that transparency is useful and important for AI Alignment, my impression is that many researchers do agree, and those who don't tend to have thought about it for qu

... (read more)

1Adam Shimi4y

You're welcome. And yes, this was as typo that I corrected. ^^ My take is that a lot of people around here agree that transparency is at least useful, and maybe necessary. And the main reason why people are not working on it is a mix of personal fit, and the fact that without research in AI Alignment proper, transparency doesn't seem that useful (if we don't know what to look for). Well, transparency is doing some work, but it's totally unable to prove anything. That's a big part of the approach I'm proposing. That being said, I agree that this doesn't look like scaling the current way. You're right that I was thinking of a more online system that could update it's weights during deployment. Yet even with frozen weights, I definitely expect the model to make plans involving things that were not involved. For example, it might not have a bio-weapon feature, but the relevant subfeature to build some by quite local rules that don't look like a plan to build a bio-weapon. That seems reasonable.

AI ALIGNMENT FORUM
AF

All of jylin04's Comments + Replies