I think part of what I was reacting to is a kind of half-formed argument that goes something like:
Meta-comment:
I noticed that I found it very difficult to read through this post, even though I felt the content was important, because of the (deliberately) condescending style. I also noticed that I'm finding it difficult to take the ideas as seriously as I think I should, again due to the style. I did manage to read through it in the end, because I do think it's important, and I think I am mostly able to avoid letting the style influence my judgments. But I find it fascinating to watch my own reaction to the post, and I'm wondering if others have any (co...
When I try to mentally simulate negative reader-reactions to the dialogue, I usually get a complicated feeling that's some combination of:
I had a pretty strong negative reaction to it. I got the feeling that the post derives much of its rhetorical force from setting up an intentionally stupid character who can be condescended to, and that this is used to sneak in a conclusion that would seem much weaker without that device.
Things I instinctively observed slash that my model believes that I got while reading that seem relevant, not attempting to justify them at this time:
I find it concerning that you felt the need to write "This is not at all a criticism of the way this post was written. I am simply curious about my own reaction to it" (and still got downvoted?).
For my part, I both believe that this post contains valuable content and good arguments, and that it was annoying / rude / bothersome in certain sections.
I've gotten one private message expressing more or less the same thing about this post, so I don't think this is a super unusual reaction.
Thanks Daniel for that strong vote of confidence!
The full graph is in fact expandable / collapsible, and it does have the ability to display the relevant paragraphs when you hover over a node (although the descriptions are not all filled in yet). It also allows people to enter in their own numbers and spit out updated calculations, exactly as you described. We actually built a nice dashboard for that - we haven't shown it yet in this sequence because this sequence is mostly focused on phase 1 and that's for phase 2.
Analytica does have a web version, but it...
I'd like to hear more thoughts, from Rohin or anybody else, about how the scaling hypothesis might affect safety work.
Thanks Adam for setting this up! I have no idea if my experience is representative, but that was definitely one of the highest-quality discussion sessions I've had at events of this type.
I don't think this is quite an example of a treacherous turn, but this still looks relevant:
Lewis et al., Deal or no deal? end-to-end learning for negotiation dialogues (2017):
...Analysing the performance of our agents, we find evidence of sophisticated negotiation strategies. For example, we find instances of the model feigning interest in a valueless issue, so that it can later ‘compromise’ by conceding it. Deceit is a complex skill that requires hypothesising the other agent’s beliefs, and is learnt relatively late in child development (Talwar and Lee, 200
That's later in the linked wiki page: https://timelines.issarice.com/wiki/Timeline_of_AI_safety#Full_timeline
Excellent, thanks! Now I just need a similar timeline for near-term safety engineering / assured autonomy as they relate to AI, and then a good part of a paper I'm working on will have just written itself.
Also - particular papers that you think are important, especially if you think they might be harder to find in a quick literature search. I'm part of an AI Ethics team at work, and I would like to find out about these as well.
This was actually part of a conversation I was having with this colleague regarding whether or not evolution can be viewed as an optimization process. Here are some follow-up comments to what she wrote above related to the evolution angle:
We could define the natural selection system as:
All configurations = all arrangements of matter on a planet (both arrangements that are living and those that are non-living)
Basis of attraction = all arrangements of matter on a planet that meet the definition of a living thing
Target configuration set = all arrangements of...
I shared this essay with a colleague where I work (Johns Hopkins University Applied Physics Lab). Here are her comments, which she asked me to share:
This essay proposes a very interesting definition of optimization as the manifestation of a particular behavior of a closed, physical system. I haven’t finished thinking this over, but I suspect it will be (as is suggested in the essay) a useful construct. The reasoning leading to the definition is clearly laid out (thank you!), with examples that are very useful in understanding the concept. The downsi...
You should make this a top level post so it gets visibility. I think it's important for people to know the caveats attached to your results and the limits on its implications in real-world dynamics.