I finally got around to reading this sequence, and I really like the ideas behind these methods. This feels like someone actually trying to figure out exactly how fragile human values are. It's especially exciting because it seems like it hooks right into an existing, normal field of academia (thus making it easier to leverage their resources toward alignment).
I do have one major issue with how the takeaway is communicated, starting with the term "catastrophic". I would only use that word when the outcome of the optimization is really bad, much worse that ...
I'll also note that I think what you're calling "Vingean agency" is a notable sub-type of optimization process that you've done a good job at analyzing here. But it's definitely not the definition of optimization or agency to me. For example, in the post you say
We perceive agency when something is better at doing something than us; we endorse some aspect of its reasoning or activity.
This doesn't feel true to me (in the carve-nature-at-its-joints sense). I think children are strongly agents, even though I do everything more competently than they do.
I have some comments on the arbitrariness of the "baseline" measure in Yudkowsky's measure of optimization.
Sometimes, I am surprised in the moment about how something looks, and I quickly update to believing there's an optimization process behind it. For example, if I climb a hill expecting to see a natural forest, and then instead see a grid of suburban houses or an industrial logging site, I'll immediately realize that there's no way this is random and instead there's an optimization process that I wasn't previously modelling. In cases like this, I think...
I feel like there's a key concept that you're aiming for that isn't quite spelled out in the math.
I remember reading somewhere that there's a typically unmentioned distinction between "Bayes' theorem" and "Bayesian inference". Bayes' theorem is the statement about , which is true from the axioms of probability theory for any and whatsoever. Notably, it has nothing to do with time, and it's still true even after you learn . On the other hand, Bayesian inference is the premise your beliefs should change in accordance...
You might be interested in some of my open drafts about optimization;
One distinction that I pretty strongly hold as carving nature at its joint is (what I call) optimization vs agents. Optimization has no concept of a utility function, and it just about the state going up an ordering. Agents are the thing that has a utility function, which they need for picking actions with probabilistic outcomes.
I feel very on-board with this research aesthetic.
Here are just some nit-picks/notational confusions I had while reading this;
- The sequence , i.e., , is the computation seeded at (or a “trajectory” in dynamical systems terminology).
...
- A property is achieved by a computation s if there exists some number of steps such that ...
It took me a second to figure out what referred to, partly because the first s was not rendered in LaTeX, partly because it was n...
I would especially especially love it if it popped out a .tex file that I could edit, since I'm very likely to be using different language on LW than I would in a fancy academic paper.
Some small corrections/additions to my section ("Altair agent foundations"). I'm currently calling it "Dovetail research". That's not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.
I wouldn't put number 9. Not intended to "solve" most of these problems, but is intended to help make progress on understanding the nature of the problems through... (read more)