hmm, I wouldn't think of industrialism and human empowerment as trying to grab the whole future, just part of it, in line with the relatively short term (human not cosmic timescale) needs of the self and extended community; industrialism seems to lead to capitalist organization which leads to decentralization superseding nations and such (as Land argues).
I think communism isn't generally about having one and one's friends in charge, it is about having human laborers in charge. One could argue that it tended towards nationalism (e.g. USSR), but I'm not conv...
re meta ethical alternatives:
"as important as ever": no, because our potential influence is lower, and the influence isn't on things shaped like our values, there has to be a translation, and the translation is different from the original.
CEV: while it addresses "extrapolation" it seems broadly based on assuming the extrapolation is ontologically easy, and "our CEV" is an unproblematic object we can talk about (even though it's not mathematically formalized, any formalization would be subject to doubt, and even if formalized, we need logical uncertainty over it, and logical induction ...
Yes I still endorse the post. Some other posts:
Two alternatives to logical counterfactuals (note: I think policy dependent source code works less well than I thought it did at the time of writing)
A critical agential account... (general framework, somewhat underspecified or problematic in places but leads to more specific things like the linear logic post; has similarities to constructor theory)
U axiomatizes a consistent guessing oracle producing a model of T. There is no consistent guessing oracle applied to U.
In the previous post I showed that a consistent guessing oracle can produce a model of T. What I show in this post is that the theory of this oracle can be embedded in propositional logic so as to enable provability preserving translations.
Ok, I misunderstood. (See also my post on the relation between local and global optimality, and another post on coordinating local decisions using MCMC)
UDT1.0, since it’s just considering modifying its own move, corresponds to a player that’s acting as if it’s independent of what everyone else is deciding, instead of teaming up with its alternate selves to play the globally optimal policy.
I thought UDT by definition pre-computes the globally optimal policy? At least, that's the impression I get from reading Wei Dai's original posts.
There are evolutionary priors for what to be afraid of but some of it is learned. I've heard children don't start out fearing snakes but will easily learn to if they see other people afraid of them, whereas the same is not true for flowers (sorry, can't find a ref, but this article discusses the general topic). Fear of heights might be innate but toddlers seem pretty bad at not falling down stairs. Mountain climbers have to be using mainly mechanical reasoning to figure out which heights are actually dangerous. It seems not hard to learn the way in which h...
From a within-lifetime perspective, getting bored is instrumentally useful for doing "exploration" that results in finding useful things to do, which can be economically useful, be effective signalling of capacity, build social connection, etc. Curiosity is partially innate but it's also probably partially learned. I guess that's not super different from pain avoidance. But anyway, I don't worry about an AI that fails to get bored, but is otherwise basically similar to humans, taking over, because not getting bored would result in being ineffective at accomplishing open-ended things.
I think use of AI tools could have similar results to human cognitive enhancement, which I expect to basically be helpful. They'll have more problems with things that are enhanced by stuff like "bigger brain size" rather than "faster thought" and "reducing entropic error rates / wisdom of the crowds" because they're trained on humans. One can in general expect more success on this sort of thing by having an idea of what problem is even being solved. There's a lot of stuff that happens in philosophy departments that isn't best explained by "solving the prob...
Something approximating utility function optimization over partial world configurations. What scope of world configuration space is optimized by effective systems depends on the scope of the task. For something like space exploration, the scope of the task is such that accomplishing it requires making trade-offs over a large sub-set of the world, and efficient ways of making these trade-offs are parametrized by utility function over this sub-set.
What time-scale and spatial scope the "pick thoughts in your head" optimization is over depends on what scope is...
Note that beyond not-being-mentioned, such arguments are also anthropically filtered against: in worlds where such arguments have been out there for longer, we died a lot quicker, so we’re not there to observe those arguments having been made.
This anthropic analysis doesn't take into account past observers (see this post).
What if I’m mainly interested in how philosophical reasoning ideally ought to work?
My view would suggest: develop a philosophical view of normativity and apply that view to the practice of philosophy itself. For example, if it is in general unethical to lie, then it is also unethical to lie about philosophy. Philosophical practice being normative would lead to some outcomes being favored over others. (It seems like a problem if you need philosophy to have a theory of normativity and a theory of normativity to do meta-philosophy and meta-philosophy to do...
Philosophy is a social/intellectual process taking place in the world. If you understand the world, you understand how philosophy proceeds.
Sometimes you don't need multiple levels of meta. There's stuff, and there's stuff about stuff, which could be called "mental" or "intensional". Then there's stuff about stuff about stuff (philosophy of mind etc). But stuff about stuff about stuff is a subset of stuff about stuff. Mental content has material correlates (writing, brain states, etc). I don't think you need a special category for stuff about stuff about st...
Philosophy is a social/intellectual process taking place in the world. If you understand the world, you understand how philosophy proceeds.
What if I'm mainly interested in how philosophical reasoning ideally ought to work? (Similar to how decision theory studies how decision making normatively should work, not how it actually works in people.) Of course if we have little idea how real-world philosophical reasoning works, understanding that first would probably help a lot, but that's not the ultimate goal, at least not for me, for both intellectual and A...
Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?
Note the preceding
Let's first, within a critical agential ontology, disprove some very basic forms of determinism.
I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.
Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.
AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.
...AI improving itself is most likely to look like AI systems doing R&D in the same way that humans do. “AI smart enough to improve itself” is not a crucial threshold, AI systems will get gradually better at improving themselves. Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research), but I think this is mostly unjustified. If Eliezer doesn’t believe this, then his arguments about the alignment problem that hum
My sense is that we are on broadly the same page here. I agree that "AI improving AI over time" will look very different from "humans improving humans over time" or even "biology improving humans over time." But I think that it will look a lot like "humans improving AI over time," and that's what I'd use to estimate timescales (months or years, most likely years) for further AI improvements.
“myopia” (not sure who correctly named this as a corrigibility principle),
I think this is from Paul Christiano, e.g. this discussion.
I assumed EER did account for that based on:
All portable air conditioner’s energy efficiency is measured using an EER score. The EER rating is the ratio between the useful cooling effect (measured in BTU) to electrical power (in W). It’s for this reason that it is hard to give a generalized answer to this question, but typically, portable air conditioners are less efficient than permanent window units due to their size.
Regarding the back-and-forth on air conditioners, I tried Google searching to find a precedent for this sort of analysis; the first Google result was "air conditioner single vs. dual hose" was this blog post, which acknowledges the inefficiency johnswentworth points out, overall recommends dual-hose air conditioners, but still recommends single-hose air conditioners under some conditions, and claims the efficiency difference is only about 12%.
Highlights:
...In general, a single-hose portable air conditioner is best suited for smaller rooms. The reason being
Btw, there is some amount of philosophical convergence between this and some recent work I did on critical agential physics; both are trying to understand physics as laws that partially (not fully) predict sense-data starting from the perspective of a particular agent.
It seems like "infra-Bayesianism" may be broadly compatible with frequentism; extending Popper's falsifiability condition to falsify probabilistic (as opposed to deterministic) laws yields frequentist null hypothesis significance testing, e.g. Neyman Pearson; similarly, frequentism also attem...
Thanks for reading all the posts!
I'm not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem.
The core problem it's solving is that it's a well-defined mathematical framework in which (a) there are, in some sense, choices, and (b) it is believed that these choices correspond to the results of a particular Turing machine. It goes back to the free will vs determinism paradox, and shows that there's a fo...
It seems like agents in a deterministic universe can falsify theories in at least some sense. Like they take two different weights drop them and see they land at the same time falsifying the fact that heavier objects fall faster
The main problem is that it isn't meaningful for their theories to make counterfactual predictions about a single situation; they can create multiple situations (across time and space) and assume symmetry and get falsification that way, but it requires extra assumptions. Basically you can't say different theories really disagree...
I previously wrote a post about reconciling free will with determinism. The metaphysics implicit in Pearlian causality is free will (In Drescher's words: "Pearl's formalism models free will rather than mechanical choice."). The challenge is reconciling this metaphysics with the belief that one is physically embodied. That is what the post attempts to do; these perspectives aren't inherently irreconcilable, we just have to be really careful about e.g. distinguishing "my action" vs "the action of the computer embodying me" in a the Bayes net and distingu...
How do you think this project relates to Ought? Seems like the projects share a basic objective (having AI predict human thoughts had in the course of solving a task). Ought has more detailed proposals for how the thoughts are being used to solve the task (in terms of e.g. factoring a problem into smaller problems, so that the internal thoughts are a load-bearing part of the computation rather than an annotation that is predicted but not checked for being relevant).
...So we are taking one of the outputs that current AIs seem to have learned best to design
This section seemed like an instance of you and Eliezer talking past each other in a way that wasn't locating a mathematical model containing the features you both believed were important (e.g. things could go "whoosh" while still being continuous):
[Christiano][13:46]
Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress
[Yudkowsky][13:46]
unfortunately a lot of what you are saying, fro...
My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead.
This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees
I thought Eliezer's comment "there is a sufficiently high level where things go who...
A bunch of this was frustrating to read because it seemed like Paul was yelling "we should model continuous changes!" and Eliezer was yelling "we should model discrete events!" and these were treated as counter-arguments to each other.
It seems obvious from having read about dynamical systems that continuous models still have discrete phase changes. E.g. consider boiling water. As you put in energy the temperature increases until it gets to the boiling point, at which point more energy put in doesn't increase the temperature further (for a while), it conv...
I don’t really feel like anything you are saying undermines my position here, or defends the part of Eliezer’s picture I’m objecting to.
(ETA: but I agree with you that it's the right kind of model to be talking about and is good to bring up explicitly in discussion. I think my failure to do so is mostly a failure of communication.)
I usually think about models that show the same kind of phase transition you discuss, though usually significantly more sophisticated models and moving from exponential to hyperbolic growth (you only get an exponential in your mo...
CDT and EDT have known problems on 5 and 10. TDT/UDT are insufficiently formalized, and seem like they might rely on known-to-be-unfomalizable logical counterfactuals.
So 5 and 10 isn't trivial even without spurious counterfactuals.
What does this add over modal UDT?
Philosophically, it works as a
...Good point; I've changed the wording to make it clear that the rational-delimited open intervals are the basis, not all the locale elements. Luckily, points can be defined as sets of basis elements containing them, since all other properties follow. (Making the locale itself countable requires weakening the definition by making the sets to form unions over countable, e.g. by requiring them to be recursively enumerable)
I've also been thinking about the application of agency abstractions to decision theory, from a somewhat different angle.
It seems like what you're doing is considering relations between high-level third-person abstractions and low-level third-person abstractions. In contrast, I'm primarily considering relations between high-level first-person abstractions and low-level first-person abstractions.
The VNM abstraction itself assumes that "you" are deciding between different options, each of which has different (stochastic) consequences; thus, it is inherently
...Looking back on this, it does seem quite similar to EDT. I'm actually, at this point, not clear on how EDT and TDT differ, except in that EDT has potential problems in cases where it's sure about its own action. I'll change the text so it notes the similarity to EDT.
On XOR blackmail, SIDT will indeed pay up.
The way you are using it doesn’t necessarily imply real control, it may be imaginary control.
I'm discussing a hypothetical agent who believes itself to have control. So its beliefs include "I have free will". Its belief isn't "I believe that I have free will".
It’s a “para-consistent material conditional” by which I mean the algorithm is limited in such a way as to prevent this explosion.
Yes, that makes sense.
However, were you flowing this all the way back in time?
Yes (see thread with Abram Demski).
What do you mean by dualistic?
Already fact
...Secondly, “free will” is such a loaded word that using it in a non-standard fashion simply obscures and confuses the discussion.
Wikipedia says "Free will is the ability to choose between different possible courses of action unimpeded." SEP says "The term “free will” has emerged over the past two millennia as the canonical designator for a significant kind of control over one’s actions." So my usage seems pretty standard.
...For example, recently I’ve been arguing in favour of what counts as a valid counterfactual being at least partially a matter of soc
I think it's worth examining more closely what it means to be "not a pure optimizer". Formally, a VNM utility function is a rationalization of a coherent policy. Say that you have some idea about what your utility function is, U. Suppose you then decide to follow a policy that does not maximize U. Logically, it follows that U is not really your utility function; either your policy doesn't coherently maximize any utility function, or it maximizes some other utility function. (Because the utility function is, by definition, a rationalization of the poli
...It seems the approaches we're using are similar, in that they both are starting from observation/action history with posited falsifiable laws, with the agent's source code not known a priori, and the agent considering different policies.
Learning "my source code is A" is quite similar to learning "Omega predicts my action is equal to A()", so these would lead to similar results.
Policy-dependent source code, then, corresponds to Omega making different predictions depending on the agent's intended policy, such that when comparing policies, the agent has to imagine Omega predicting differently (as it would imagine learning different source code under policy-dependent source code).
I agree this is a problem, but isn't this a problem for logical counterfactual approaches as well? Isn't it also weird for a known fixed optimizer source code to produce a different result on this decision where it's obvious that 'left' is the best decision?
If you assume that the agent chose 'right', it's more reasonable to think it's because it's not a pure optimizer than that a pure optimizer would have chosen 'right', in my view.
If you form the intent to, as a policy, go 'right' on the 100th turn, you should anticipate learning that your source code is not the code of a pure optimizer.
In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.
Conditioning on 'A(obs) = act' is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between "If Oswald didn't kill Kennedy, then someone else did" and "If Oswald didn't kill Kennedy, then someone else would have".
Indeed, troll bridge will present a problem for "playing chicken" approaches, whic
...
Relativity to Newtonian mechanics is a warp in a straightforward sense. If you believe the layout of a house consists of some rooms connected in a certain way, but there are actually more rooms connected in different ways, getting the maps to line up looks like a warp. Basically, the closer the mapping is to a true homomorphism (in the universal algebra sense), the less warping there is, otherwise there are deviations intuitively analogous to space warps.