I work on applied mathematics and AI at the Johns Hopkins University Applied Physics Laboratory (APL). I am also currently pursuing a PhD in Information Systems at the University of Maryland, Baltimore County (UMBC). My PhD research focuses on decision and risk analysis under extreme uncertainty, with a particular focus on potential existential risks from very advanced AI.
I think part of what I was reacting to is a kind of half-formed argument that goes something like:
Again, this isn't actually an argument I would make. It's just me trying to articulate my initial negative reactions to the post.
Meta-comment:
I noticed that I found it very difficult to read through this post, even though I felt the content was important, because of the (deliberately) condescending style. I also noticed that I'm finding it difficult to take the ideas as seriously as I think I should, again due to the style. I did manage to read through it in the end, because I do think it's important, and I think I am mostly able to avoid letting the style influence my judgments. But I find it fascinating to watch my own reaction to the post, and I'm wondering if others have any (constructive) insights on this.
In general I I've noticed that I have a very hard time reading things that are written in a polemical, condescending, insulting, or ridiculing manner. This is particularly true of course if the target is a group / person / idea that I happen to like. But even if it's written by someone on "my side" I find I have a hard time getting myself to read it. There have been several times when I've been told I should really go read a certain book, blog, article, etc., and that it has important content I should know about, but I couldn't get myself to read the whole thing due to the polemical or insulting way in which it was written.
Similarly, as I noted above, I've noticed that I often have a hard time taking ideas as seriously as I probably should if they're written in a polemical / condescending / insulting / ridiculing style. I think maybe I tend to down-weight the credibility of anybody who writes like that, and by extension maybe I subconsciously down-weight the content? Maybe I'm subconsciously associating condescension (at least towards ideas / people I think of as worth taking seriously) with bias? Not sure.
I've heard from other people that they especially like polemical / condescending articles, and I imagine that it is effective / persuasive for a lot of readers. For all I know this is far and away the most effective way of writing this kind of thing. And even if not, Eliezer is perfectly within his rights to use whatever style he wants. Eliezer explicitly acknowledges the condescending-sounding tone of the article, but felt it was worth writing it that way anyway, and that's fine.
So to be clear: This is not at all a criticism of the way this post was written. I am simply curious about my own reaction to it, and I'm interested to hear what others think about that.
A few questions:
Thanks Daniel for that strong vote of confidence!
The full graph is in fact expandable / collapsible, and it does have the ability to display the relevant paragraphs when you hover over a node (although the descriptions are not all filled in yet). It also allows people to enter in their own numbers and spit out updated calculations, exactly as you described. We actually built a nice dashboard for that - we haven't shown it yet in this sequence because this sequence is mostly focused on phase 1 and that's for phase 2.
Analytica does have a web version, but it's a bit clunky and buggy so we haven't used it so far. However, I was just informed that they are coming out with a major update soon that will include a significantly better web version, so hopefully we can do all this then.
I certainly don't think we'd say no to additional funding or interns! We could certainly use them - there are quite a few areas that we have not looked into sufficiently because all of our team members were focused on other parts of the model. And we haven't gotten yet to much of the quantitative part (phase 2 as you called it), or the formal elicitation part.
I'd like to hear more thoughts, from Rohin or anybody else, about how the scaling hypothesis might affect safety work.
Thanks Adam for setting this up! I have no idea if my experience is representative, but that was definitely one of the highest-quality discussion sessions I've had at events of this type.
I don't think this is quite an example of a treacherous turn, but this still looks relevant:
Lewis et al., Deal or no deal? end-to-end learning for negotiation dialogues (2017):
Analysing the performance of our agents, we find evidence of sophisticated negotiation strategies. For example, we find instances of the model feigning interest in a valueless issue, so that it can later ‘compromise’ by conceding it. Deceit is a complex skill that requires hypothesising the other agent’s beliefs, and is learnt relatively late in child development (Talwar and Lee, 2002). Our agents have learnt to deceive without any explicit human design, simply by trying to achieve their goals.
(I found this reference cited in Kenton et al., Alignment of Language Agents (2021).)
You should make this a top level post so it gets visibility. I think it's important for people to know the caveats attached to your results and the limits on its implications in real-world dynamics.