This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.
In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.
This strikes some people as absurd or at best misleading. I disagree.
The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...
Summary
We now resume your regularly scheduled LessWrong tradition of decision theory posting.
Just the first and last post will be on Alignment Forum, and the whole thing will be linked together.
Epistemic Status: This is mostly just recapping old posts so far. If you're a decision-theory veteran, new stuff only starts arising in the "Computational Intractability" section and further down.
You may have heard of a thing called Updateless Decision Theory. It's been discussed for over a decade by now, but progress on rendering it down into something that could maybe someday be run on a computer has been very slow. Last winter, I decided to try helping out Abram and Martin with their efforts at actually formalizing it into some sort of algorithm that nice things could be...
Ok, I misunderstood. (See also my post on the relation between local and global optimality, and another post on coordinating local decisions using MCMC)
I don't expect this post to contain anything novel. But from talking to others it seems like some of what I have to say in this post is not widely known, so it seemed worth writing.
In this post I'm defining superposition as: A representation with more features than neurons, achieved by encoding the features as almost orthogonal vectors in neuron space.
One reason to expect superposition in neural nets (NNs), is that for large , has many more than almost orthogonal directions. On the surface, this seems obviously useful for the NN to exploit. However, superposition is not magic. You don't actually get to put in more information, the gain you get from having more feature directions has to be paid for some other way.
All the math in this post is...
The math in the post is super hand-wavey, so I don't expect the result to be exactly correct. However in your example, l up to 100 should be ok, since there is no super position. 2.7 is almost 2 orders of magnitude off, which is not great.
Looking into what is going on: I'm basing my results on the Johnson–Lindenstrauss lemma, which gives an upper bound on the interference. In the post I'm assuming that the actual interference is order of magnitude the same as the this upper bound. This assumption is clearly fails in your example since the interference betw...
Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I'm pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me:
Saturday Morning Breakfast Cereal - Law (smbc-comics.com)
Law is not a good representation or explanation of most of what we care about, because it's not trying to be. Law is mainly focused on the c...
(Note: I wrote this with editing help from Rob and Eliezer. Eliezer's responsible for a few of the paragraphs.)
A common confusion I see in the tiny fragment of the world that knows about logical decision theory (FDT/UDT/etc.), is that people think LDT agents are genial and friendly for each other.[1]
One recent example is Will Eden’s tweet about how maybe a molecular paperclip/squiggle maximizer would leave humanity a few stars/galaxies/whatever on game-theoretic grounds. (And that's just one example; I hear this suggestion bandied around pretty often.)
I'm pretty confident that this view is wrong (alas), and based on a misunderstanding of LDT. I shall now attempt to clear up that confusion.
To begin, a parable: the entity Omicron (Omega's little sister) fills box A with $1M and box B with...
If you can put uploaded human-level agents with evolved-organism preferences in your simulations, you can just win outright (eg by having them spend subjective millennia doing FAI research for you). If you can’t, that will be a very obvious difference between your simulations and the real world.
I disagree. If your simulation is perfectly realistic, the simulated humans might screw up at alignment and create an unfriendly superintelligence, for much the same reason real humans might.
Also, if the space of goals that evolution + culture can...
Yeah, I didn't do a very good job in this respect. I am not intending to talk about a transformer by itself. I am intending to talk about transformers with the sorts of bells and whistles that they are currently being wrapped with. So not just transformers, but also not some totally speculative wrapper.