In this post I argued that an AI-induced point of no return would probably happen before world GDP starts to noticeably accelerate. You gave me some good pushback about the historical precedent I cited, but what is your overall view? If you can spare the time, what is your credence in each of the following PONR-before-GDP-acceleration scenarios, and why?
1. Fast takeoff
2. The sorts of skills needed to succeed in politics or war are easier to develop in AI than the sorts needed to accelerate the entire world economy, and/or have less deployment lag. (Maybe it takes years to build the relevant products and industries to accelerate the economy, but only months to wage a successful propaganda campaign to get people to stop listening to the AI safety community)
3. We get an "expensive AI takeoff" in which AI capabilities improve enough to cross some threshold of dangerousness, but this improvement happens in a very compute-intensive way that makes it uneconomical to automate a significant part of the economy until the threshold has been crossed.
4. Vulnerable world: Thanks to AI and other advances, a large number of human actors get the ability to make WMD's.
5. Persuasion/p...
I don't know if we ever cleared up ambiguity about the concept of PONR. It seems like it depends critically on who is returning, i.e. what is the counterfactual we are considering when asking if we "could" return. If we don't do any magical intervention, then it seems like the PONR could be well before AI since the conclusion was always inevitable. If we do a maximally magical intervention, of creating unprecedented political will, then I think it's most likely that we'd see 100%+ annual growth (even of say energy capture) before PONR. I don't think there are reasonable definitions of PONR where it's very likely to occur before significant economic acceleration.
I don't think I consider most of the scenarios list necessarily-PONR-before-GDP acceleration scenarios, though many of them could permit PONR-before-GDP if AI was broadly deployed before it started adding significant economic value.
All of these probabilities are obviously pretty unreliable and made up on the spot:
1. Fast takeoff
Defined as 1-year doubling starts before 4-year doubling finishes, maybe 25%?
...2. The sorts of skills needed to succeed in politics or war are easier to develop in AI than the sorts needed to accelerate
What are the most important ideas floating around in alignment research that don't yet have a public write-up? (Or, even better, that have a public write-up but could do with a good one?)
I have a big gap between "stuff I've written up" and "stuff that I'd like to write up." Some particular ideas that come to mind: how epistemic competitiveness seems really important for alignment; how I think about questions like "aligned with whom" and why I think it's good to try to decouple alignment techniques from decisions about values / preference aggregation (this position is surprisingly controversial); updated views on the basic dichotomy in Two Kinds of Generalization and the current best hopes for avoiding the bad kind.
I think that there's a cluster of really important questions about what we can verify, how "alien" the knowledge of ML systems will be, and how realistic it's going to be to take a kind of ad hoc approach to alignment. In my experience people with a more experimental bent to be more optimistic about those questions tend to have a bunch of intuitions about those questions that do kind of hang together (and are often approximately shared across people). This comes with some more color on the current alignment plan / what's likely to happen in practice as people try to solve the problem on their feet. I don't think that's really been written up well but it s...
Unfortunately (fortunately?) I don't feel like I have access to any secret truths. Most idiosyncratic things I believe are pretty tentative, and I hang out with a lot of folks who are pretty open to the kinds of weird ideas that might have ended up feeling like Paul-specific secret truths if I hung with a more normal crowd.
It feels like my biggest disagreement with people around me is something like: to what extent is it likely to be possible to develop an algorithm that really looks on paper like it should just work for aligning powerful ML systems. I'm at like 50-50 and I think that the consensus estimate of people in my community is more like "Uh, sure doesn't sound like that's going to happen, but we're still excited for you to try."
Do you know what sorts of people you're looking to hire? How much do you expect ARC to grow over the coming years, and what will the employees be doing? I can imagine it being a fairly small group of like 3 researchers and a few understudies, I can also imagine it growing to 30 people like MIRI. Which one of these is it closer to?
I'd like to hire a few people (maybe 2 researchers median?) in 2021. I think my default "things are going pretty well" story involves doubling something like every 1-2 years for a while. Where that caps out / slows down a lot depends on how the field shapes out and how broad our activities are. I would be surprised if I wanted to stop growing at <10 people just based on the stuff I really know I want to do.
The very first hires will probably be people who want to work on the kind of theory I do, since right now that's what I'm feeling most excited about and really want to set up a team working on. I don't really know where that will end up going.
Once getting that going I'm not sure whether the next step will be growing it further or branching out into other things, and it will probably depend on how the theory work goes. I could also imagine doing enough theory on my own to change my view about how promising it is and make initial hires in another area instead.
You've written multiple outer alignment failure stories. However, you've also commented that these aren't your best predictions. If you condition on humanity going extinct because of AI, why did it happen?
I think my best guess is kind of like this story, but:
Pre-hindsight: 100 years from now, it is clear that your research has been net bad for the long-term future. What happened?
Some plausible and non-exhaustive options, in roughly descending order of plausibility:
As an aside, I think that the possibility of "work doesn't matter" is typically way more important then "work was net bad," at least once you are making a serious effort to do something good rather than bad for the world (I agree that for the "average" project in the world the negative impacts are actually pretty large relative to the positive impacts).
EAs/rationalists often focus on the chance of a big downside clawing back value. I think that makes sense to think seriously about, and sometimes it's a big deal, but most of the time the quantitative estimates just don't seem to add up at all to me and I think people are making a huge quantitative error. I'm not sure exactly where we disagree, I think a lot of it is just that I'm way more skeptical about the ability to incidentally change the world a huge amount---I think that changing the world a lot usually just takes quite a bit of effort.
I guess in some sense I agree that the downside is big for normal butterfly-effect-y reasons (probably 50% of well-intentioned actions make the world worse ex post), so it's also possible that I'm just answering this question in a slightly different way.
My big caveat is that I think the numbers ...
"Even if actively trying to push the field forward full-time I'd be a small part of that effort"
I think conditioning on something like 'we're broadly correct about AI safety' implies 'we're right about some important things about how AI development will go that the rest of the ML community is surprisingly wrong about'. In that world we're maybe able to contribute as much as a much larger fraction of the field, due to being correct about some things that everyone else is wrong about.
I think your overall point still stands, but it does seem like you sometimes overestimate how obvious things are to the rest of the ML community
What's the most important thing that AI alignment researchers have learned in the past 10 years? Also, that question but excluding things you came up with.
"Thing" is tricky. Maybe something like the set of intuitions and arguments we have around learned optimizers, i.e. the basic argument that ML will likely produce a system that is "trying" to do something, and that it can end up performing well on the training distribution regardless of what it is "trying" to do (and this is easier the more capable and knowledgeable it is). I don't think we really know much about what's going on here, but I do think it's an important failure to be aware of and at least folks are looking for it now. So I do think that if it happens we're likely to notice it earlier than we would if taking a purely experimentally-driven approach and it's possible that at the extreme you would just totally miss the phenomenon. (This may not be fair to put in the last 10 years, but thinking about it sure seemed like a mess >10 years ago.)
(I may be overlooking something such that I really regret that answer in 5 minutes but so it goes.)
I wonder how valuable you find some of the more math/theory focused research directions in AI safety. I.e., how much less impactful do you find them, compared to your favorite directions? In particular,
I'd also be interested in suggestions for other impactful research directions/areas that are more theoretical and less ML-focused (expanding on adamShimi's question, I wonder which part of mathematics and statistics you expect to be particularly useful).
I'm generally bad at communicating about this kind of thing, and it seems like a kind of sensitive topic to share half-baked thoughts on. In this AMA all of my thoughts are half-baked, and in some cases here I'm commenting on work that I'm not that familiar with. All that said I'm still going to answer but please read with a grain of salt and don't take it too seriously.
Vanessa Kosoy's learning-theoretic agenda, e.g., the recent sequence on infra-Bayesianism, or her work on traps in RL. Michael Cohen's research, e.g. the paper on imitation learning seems to go into a similar direction.
I like working on well-posed problems, and proving theorems about well-posed problems are particularly great.
I don't currently expect to be able to apply those kinds of algorithms directly to alignment for various reasons (e.g. no source of adequate reward function that doesn't go through epistemic competitiveness which would also solve other aspects of the problem, not practical to get exact imitation), so I'm mostly optimistic about learning something in the course of solving those problems that turns out to be helpful. I think that's plausible because these formal problems do engage some of the dif...
Not really.
I expect that many humans will continue to participate in a process of collectively clarifying what we want and how to govern the universe. I wouldn't be surprised if that involves a lot of life-kind-of-like-normal that gradually improves in a cautious way we endorse rather than some kind of table-flip (e.g. I would honestly not be surprised if post-singularity we still end up raising another generation because there's no other form of "delegation" that we feel more confident about). And of course in such a world I expect to just continue to spend a lot of time thinking, again probably under conditions that are designed to be gradually improving rather than abruptly changing. The main weird thing is that this process will now be almost completely decoupled from productive economic activity.
I think it's hard to talk about "your life" and identity is likely to be fuzzy over the long term. I don't think that most of the richness and value in the world will come from creatures who feel like "us" (and I think our selfish desires are mostly relatively satiable). That said, I do also expect that basically all of the existing humans will have a future that they feel excited abou...
I don't have an easy way of slicing my work up / think that it depends on how you slice it. Broadly I think the two candidates are (i) making RL from human feedback more practical and getting people excited about it at OpenAI, (ii) the theoretical sequence from approval-directed agents and informed oversight to iterated amplification to getting a clear picture of the limits of iterated amplification and setting out on my current research project. Some steps of that were really hard for me at the time though basically all of them now feel obvious.
My favorite blog post was probably approval-directed agents, though this is very much based on judging by the standards of how-confused-Paul-started-out. I think that it set me on a way better direction for thinking about AI safety (and I think it also helped a lot of people in a similar way). Ultimately it's clear that I didn't really understand where the difficulties were, and I've learned a lot in the last 6 years, but I'm still proud of it.
How many ideas of the same size as "maybe a piecewise linear non-linearity would work better than a sigmoid for not having vanishing gradients" are we away from knowing how to build human-level AI technology?
I think it's >50% chance that ideas like ReLUs or soft attention are best though of as multiplicative improvements on top of hardware progress (as are many other ideas like auxiliary objectives, objectives that better capture relevant tasks, infrastructure for training more efficiently, dense datasets, etc.), because the basic approach of "optimize for a task that requires cognitive competence" will eventually yield human-level competence. In that sense I think the answer is probably 0.
Maybe my median number of OOMs left before human-level intelligence, including both hardware and software progress, is 10 (pretty made-up). Of that I'd guess around half will come from hardware, so call it 5 OOMs of software progress. Don't know how big that is relative to ReLUs, maybe 5-10x? (But hard to define the counterfactual w.r.t. activation functions.)
(I think that may imply much shorter timelines than my normal view. That's mostly from thoughtlessness in this answer which was quickly composed and didn't take into account many sources of evidence, some is from legit correlations not taken into account here, some is maybe legitimate signal from an alternative estimation approach, not sure.)
How many ideas of the same size as "maybe we could use inverse reinforcement learning to learn human values" are we away from knowing how to knowably and reliably build human-level AI technology that wouldn't cause something comparably bad as human extinction?
A lot of this is going to come down to estimates of the denominator.
(I mostly just think that you might as well just ask people "Is this good?" rather than trying to use a more sophisticated form of IRL---in particular I don't think that realistic versions of IRL will successfully address the cases where people err in answering the "is it good?" question, that directly asking is more straightforward in many important ways, and that we should mostly just try to directly empower people to give better answers to such questions.)
Anyway, with that caveat and kind of using the version of your idea that I feel most enthusiastic about (and construing it quite broadly), I have a significant probability on 0, maybe a median somewhere in 10-20, significant probability at very high levels.
What was your biggest update about the world from living through the coronavirus pandemic?
Follow-up: does it change any of your feelings about how civilization will handle AGI?
I found our COVID response pretty "par for the course" in terms of how well we handle novel challenges. That was a significant negative update for me because I had a moderate probability on us collectively pulling out some more exceptional adaptiveness/competence when an issue was imposing massive economic costs and had a bunch of people's attention on it. I now have somewhat more probability on AI dooms that play out slowly where everyone is watching and yelling loudly about it but it's just really tough to do something that really improves the situation (and correspondingly more total probability on doom). I haven't really sat down and processed this update or reflected on exactly how big it should be.
Do you have any advice for junior alignment researchers? In particular, what do you think are the skills and traits that make someone an excellent alignment researcher? And what do you think someone can do early in a research career to be more likely to become an excellent alignment researcher?
Some things that seem good:
I personally feel like I got a lot of benefit out of doing some research in adjacent areas, but I'd guess that mostly it's better to focus on what you actually want to achieve and just be a ...
What are the highest priority things (by your lights) in Alignment that nobody is currently seriously working on?
It's not clear how to slice the space up into pieces so that you can talk about "is someone working on this piece?" (and the answer depends a lot on that slicing). Here are two areas in robustness that feel kind of empty for my preferred way of slicing up the problem (though for a different slicing they could be reasonably crowded). These are are also necessarily areas where I'm not doing any work so I'm really out on a limb here.
I think there should be more theoretical work on neural net verification / relaxing adversarial training. I should probably update from this to think that it's more of a dead end (and indeed practical verification work does seem to have run into a lot of trouble), but to me it looks like there's got to be more you can say at least to show that various possible approaches are dead ends. I think a big problem is that you really need to keep the application in mind in order to actually know the rules of the game. (That is, we have a predicate A, say implemented as a neural network, and we want to learn a function f such that for all x we have A(x, f(x)), but the problem is only supposed to be possible because in some sense the predicate A is "easy" to satisfy...
Copying my question from your post about your new research center (because I'm really interested in the answer): which part (if any) of theoretical computer science do you expect to be particularly useful for alignment?
Going to start now. I vaguely hope to write something for all of the questions that have been asked so far but we'll see (80 questions is quite a few).
What is your theory of change for the Alignment Research Center? That is, what are the concrete pathways by which you expect the work done there to systematically lead to a better future?
For the initial projects, the plan is to find algorithmic ideas (or ideally a whole algorithm) that works well in practice, can be adopted by labs today, and would put us in a way better position with respect to future alignment challenges. If we succeed in that project, then I'm reasonably optimistic about being able to demonstrate the value of our ideas and get them adopted in practice (by a combination of describing them publicly, talking with people at labs, advising people who are trying to pressure labs to take alignment seriously about what their asks should be, and consulting for labs to help implement ideas). Even if adoption or demonstrating desirability turns out to be hard, I think that the alignment community would be in a much better place if we had a proposal that we all felt good about that we were advocating for (since we'd then have a better shot at doing so, and labs that were serious about alignment would be able to figure out what to do).
Beyond that, I'm also excited about offering concrete and well-justified advice (either about what algorithms to use or about alignment-relevant deployment decisions) that can help labs who care about alignment, or can be taken as a clear indicator of best practices so be adopted by labs who want to present as socially-responsible (whether to please employees, funders, civil society, or competitors).
But I'm mostly thinking about the impact of initial activities, and for that I feel like the theory of change is relatively concrete/straightforward.
If you could magically move most of the US rationality and x-risk and EA community to a city in the US that isn't the Bay, and you had to pick somewhere, where where would you move them to?
If I'm allowed to think about it first then I'd do that. If I'm not, then I'd regret never having thought about it, probably Seattle would be my best guess.
And on an absolute level, is the world much more or less prepared for AGI than it was 15 years ago?
Follow-up: How much did the broader x-risk community change it at all?
How many hours per week should the average AI alignment researcher spend on improving their rationality? How should they spend those hours?
I probably wouldn't set aside hours for improving rationality (/ am not exactly sure what it would entail). Seems generally good to go out of your way to do things right, to reflect on lessons learned from the things you did, to be willing to do (and slightly overinvest in) things that are currently hard in order to get better, and so on. Maybe I'd say that like 5-10% of time should be explicitly set aside for activities that just don't really move you forward (like post-mortems or reflecting on how things are going in a way that's clearly not going to pay itself off for this project) and a further 10-20% on doing things in ways that aren't the very optimal way right now but useful for getting better at doing them in the future (e.g. using unfamiliar tools, getting more advice from people than would make sense if the world ended next week, being more methodical about how you approach problems).
I guess the other aspect of this is separating some kind of general improvement from more domain specific improvement (i.e. are the numbers above about improving rationality or just getting better at doing stuff?). I think stuff that feels vaguely like "rationality" in the sense of being abou...
I'm not interested in the strongest argument from your perspective (i.e. the steelman), but I am interested how much you think you can pass the ITT for Eliezer's perspective on the alignment problem — what shape the problem is, why it's hard, and how to make progress. Can you give a sense of the parts of his ITT you think you've got?
I think I could do pretty well (it's plausible to me that I'm the favorite in any head-to-head match with someone who isn't a current MIRI employee? probably not but I'm at least close). There are definitely some places I still get surprised and don't expect to do that well, e.g. I was recently surprised by one of Eliezer's positions regarding the relative difficulty of some kinds of reasoning tasks for near-future language models (and I expect there are similar surprises in domains that are less close to near-term predictions). I don't really know how to split it into parts for the purpose of saying what I've got or not.
Did you get much from reading the sequences? What was one of the things you found most interesting or valuable personally it them?
I enjoyed Leave a Line of Retreat. It's a very concrete and simple procedure that I actually still use pretty often and I've benefited a lot just from knowing about. Other than that I think I found a bunch of the posts interesting and entertaining. (Looking back now the post is a bit bombastic, I suspect all the sequences are, but I don't really mind.)
1. What credence would you assign to "+12 OOMs of compute would be enough for us to achieve AGI / TAI / AI-induced Point of No Return within five years or so." (This is basically the same, though not identical, with this poll question)
2. Can you say a bit about where your number comes from? E.g. maybe 25% chance of scaling laws not continuing such that OmegaStar, Amp(GPT-7), etc. don't work, 25% chance that they happen but don't count as AGI / TAI / AI-PONR, for total of about 60%? The more you say the better, this is my biggest crux! Thanks!
I'd say 70% for TAI in 5 years if you gave +12 OOM.
I think the single biggest uncertainty is about whether we will be able to adapt sufficiently quickly to the new larger compute budgets (i.e. how much do we need to change algorithms to scale reasonably? it's a very unusual situation and it's hard to scale up fast and depends on exactly how far that goes). Maybe I think that there's an 90% chance that TAI is in some sense possible (maybe: if you'd gotten to that much compute while remaining as well-adapted as we are now to our current levels of compute) and conditioned on that an 80% chance that we'll actually do it vs running into problems?
(Didn't think about it too much, don't hold me to it too much. Also I'm not exactly sure what your counterfactual is and didn't read the original post in detail, I was just assuming that all existing and future hardware got 12OOM faster. If I gave numbers somewhere else that imply much less than that probability with +12OOM, then you should be skeptical of both.)
Natural language has both noise (that you can never model) and signal (that you could model if you were just smart enough). GPT-3 is in the regime where it's mostly signal (as evidenced by the fact that the loss keeps going down smoothly rather than approaching an asymptote). But it will soon get to the regime where there is a lot of noise, and by the time the model is 9 OOMs bigger I would guess (based on theory) that it will be overwhelmingly noise and training will be very expensive.
So it may or may not work in the sense of meeting some absolute performance threshold, but it will certainly be a very bad way to get there and we'll do something smarter instead.
You seem in the unusual position of having done excellent conceptual alignment work (eg with IDA), and excellent applied alignment work at OpenAI, which I'd expect to be pretty different skillsets. How did you end up doing both? And how useful have you found ML experience for doing good conceptual work, and vice versa?
Aw thanks :) I mostly trained as a theorist through undergrad, then when I started grad school I spent some time learning about ML and decided to do applied work at OpenAI. I feel like the methodologies are quite different but the underlying skills aren't that different. Maybe the biggest deltas are that ML involves much more management of attention and jumping between things in order to be effective in practice, while theory is a bit more loaded on focusing on one line of reasoning for a long time and having some clever idea. But while those are important skills I don't think they are the main things that you improve at by working in either area and aren't really core.
I feel like in general there is a lot of transfer between doing well in different research areas, though unsurprisingly it's less than 100% and I think I would be better at either domain if I'd just focused on it more. The main exception is that I feel like I'm a lot better at grounding out theory that is about ML, since I've had more experience and have more of a sense for what kinds of assumptions are reasonable in practice. And on the flip side I do think theory is similar to a lot of algorithm design/analysis questions that come up in ML (frankly it doesn't seem like a central skill but I think there are big logistical benefits from being able to do the whole pipeline as one person).
Favorite: Irit Dinur's PCP for constraint satisfaction. What a proof system.
If you want to be more pure, and consider the mathematical objects that are found rather than built, maybe the monster group? (As a layperson so I can't appreciate the full extent of what's going, on and like most people I only real know about it second-hand, but its existence seems like a crazy and beautiful fact about the world.)
Least favorite: I don't know, maybe Chaitin's constant?
Should marginal CHAI PhD graduates who are dispositionally indifferent between the two options try to become a professor or do research outside of universities?
Not sure. If you don't want to train students, seems toe me like you should be outside of a university. If you do want to train students it's less clear and maybe depends on what you want to do (and given that students vary in what they are looking for, this is probably locally self-correcting if too many people go one way or the other). I'd certainly lean away from university for the kinds of work that I want to do, or for the kinds of things that involve aligning large ML systems (which benefit from some connection to customers and resources).
What are the main ways you've become stronger and smarter over the past 5 years? This isn't a question about new object-level beliefs so much as ways-of-thinking or approaches to the world that have changed for you.
I'm pretty comfortable working with strong axioms. But in terms of "would actually blow my mind if it turned out not to be consistent," I guess alpha-inaccessible cardinals for any concrete alpha? Beyond that I don't really know enough set theory to have my mind blown.
Why did nobody in the world run challenge trials for the covid vaccine and save us a year of economic damage?
Wild speculation, not an expert. I'd love to hear from anyone who actually knows what's going on.
I think it's overoptimistic that human challenge trials would save a year, though it does seem like they could have plausibly have saved weeks or months if done in the most effective form. (And in combination with other human trials and moderate additional spending I'd definitely believe 6-12 months of acceleration was possible.)
In terms of why so few human experiments have happened in general, I think it's largely because of strong norms designed to protect experiment participants (and taken quite seriously by doctors I've talked to), together with limited upside for the experimenters, an overriding desire for vaccine manufacturers to avoid association with a trial that ends up looking bad (this doesn't apply to other kinds of trial but the upside is often lower and there's no real stakeholder), a lack of understanding for a long time of how big a problem this would be, the difficulty of quickly shifting time/attention from other problems to this one, and the general difficulty of running experiments.
What research in the past 5 years has felt like the most significant progress on the alignment problem? Has any of it made you more or less optimistic about how easy the alignment problem will be?
What is the main mistake you've made in your research, that you were wrong about?
Positive framing: what's been the biggest learning moment in the course of your work?
Basically every time I've shied away from a solution because it feels like cheating, or like it doesn't count / address the real spirit of the problem, I've regretted it. Often it turns out it really doesn't count, but knowing exactly why (and working on the problem with no holds barred) had been really important for me.
The most important case was dismissing imitation learning back in 2012-2014, together with basically giving up outright on all ML approaches, which I only recognized as a problem when I was writing up why those approaches were doomed more carefully and why imitation learning was a non-solution.
You gave a great talk on the AI Alignment Landscape 2 years ago. What would you change if giving the same talk today?
Curated. I don't think we've curated an AMA before, and not sure if I have a principled opinion on doing that, but this post seems chock full of small useful incites, and fragments of ideas that seem like they might otherwise take awhile to get written up more comprehensively, which I think is good.
Should more AI alignment research be communicated in book form? Relatedly, what medium of research communication is most under-utilized by the AI alignment community?
I think it would be good to get more arguments and ideas pinned down, explained carefully, collected in one place. I think books may be a reasonable format for that, though man they take a long time to write.
I don't know what medium is most under-utilized.
What mechanisms could effective altruists adopt to improve the way AI alignment research is funded?
Long run I'd prefer with something like altruistic equity / certificates of impact. But frankly I don't think we have hard enough funding coordination problems that it's going to be worth figuring that kind of thing out.
(And like every other community we are free-riders---I think that most of the value of experimenting with such systems would accrue to other people who can copy you if successful, and we are just too focused on helping with AI alignment to contribute to that kind of altruistic public good. If only someone would be willing to purchase the impact certificate from us if it worked out...)
What works of fiction / literature have had the strongest impact on you? Or perhaps, that are responsible for the biggest difference in your vector relative to everyone else's vector?
(e.g. lots of people were substantially impacted by the Lord of the Rings, but perhaps something else had a big impact on you that led you in a different direction from all those people)
(that said, LotR is a fine answer)
There has been surprisingly little written on concrete threat models for how AI leads to existential catastrophes (though you've done some great work rectifying this!). Why is this? And what are the most compelling threat models that don't have good public write-ups? In particular, are there under-appreciated threat models that would lead to very different research priorities within Alignment?
What's the optimal ratio of researchers to support staff in an AI alignment research organization?
Which rationalist virtue do you identify with the strongest currently? Which one would you like to get stronger at?
I mostly found myself more agreeing with Robin, in that e.g. I believe previous technical change is mostly a good reference class, that Eliezer's AI-specific arguments are mostly kind of weak. (I liked the image, I think from that debate, of a blacksmith emerging into the townsquare with his mighty industry and making all bow before them.)
That said, I think Robin's quantitative estimates/forecasts are pretty off and usually not very justified, and I think he puts too much stock on an outside view extrapolation from past transitions rather than looking at the inside view for existing technologies (the extrapolation seems helpful in the absence of anything else, but it's just not that much evidence given the shortness and noisiness of the time series and the shakiness of the underlying regularities). I don't remember exactly what kinds of estimates he gives in that debate.
(This is more obvious for his timeline estimates, which I think have an almost comically flimsy justification given how seriously he takes them.)
Overall I think that it would be more interesting to have a Carl vs Robin FOOM debate; I expect the outcome would be Robin saying "do you really call that a FOOM?" and Carl saying "well it is pretty fast and would have crazy disruptive geopolitical consequences and generally doesn't fit that well with your implied forecasts about the world even if not contradicting that many of the things you actually commit to" and we could all kind of agree and leave it at that modulo a smaller amount of quantitative uncertainty.
Other than by doing your own research, from where or whom do you tend to get valuable research insights?
I'd be interested in your thoughts on human motivation in HCH and amplification schemes.
Do you see motivational issues as insignificant / a manageable obstacle / a hard part of the problem...?
Specifically, it concerns me that every H will have preferences valued more highly than [completing whatever task we assign], so would be expected to optimise its output for its own values rather than the assigned task, where these objectives diverged. In general, output needn't relate to the question/task.
[I don't think you've addressed this at all recently - I've on...
How will we know when it's not worth getting more people to work on reducing existential risk from AI?
I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1).
If you want to ask something just post a top-level comment; I'll spend at least a day answering questions.
You can find some background about me here.