Jessica Taylor

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

Twitter: https://twitter.com/jessi_cata

Wikitag Contributions

Comments

Sorted by

Newest

The Obliqueness Thesis

Jessica Taylor6mo20

Relativity to Newtonian mechanics is a warp in a straightforward sense. If you believe the layout of a house consists of some rooms connected in a certain way, but there are actually more rooms connected in different ways, getting the maps to line up looks like a warp. Basically, the closer the mapping is to a true homomorphism (in the universal algebra sense), the less warping there is, otherwise there are deviations intuitively analogous to space warps.

The Obliqueness Thesis

Jessica Taylor6mo43

hmm, I wouldn't think of industrialism and human empowerment as trying to grab the whole future, just part of it, in line with the relatively short term (human not cosmic timescale) needs of the self and extended community; industrialism seems to lead to capitalist organization which leads to decentralization superseding nations and such (as Land argues).

I think communism isn't generally about having one and one's friends in charge, it is about having human laborers in charge. One could argue that it tended towards nationalism (e.g. USSR), but I'm not convinced that global communism (Trotskyism) would have worked out well either. Also, one could take an update from communism about agendas for global human control leading to national control (see also tendency of AI safety to be taken over by AI national security as with the Situational Awareness paper). (Again, not ruling out that grabbing hold of the entire future could be a good idea at some point, just not sold on current agendas and wanted to note there are downsides that push against Pascal's mugging type considerations)

The Obliqueness Thesis

Jessica Taylor6mo10

Thanks, going to link this!

The Obliqueness Thesis

Jessica Taylor6mo20

re meta ethical alternatives:

roughly my view
slight change, opens the question of why the deviations? are the "right things to value" not efficient to value in a competitive setting? mostly I'm trying to talk about those things to value that go along with intelligence, so it wouldn't correspond with a competitive disadvantage in general. so it's still close enough to my view
roughly Yudkowskian view, main view under which the FAI project even makes sense. I think one can ask basic questions like which changes move towards more rationality on the margin, though such changes would tend to prioritize rationality over preventing value drift. I'm not sure how much there are general facts about how to avoid value drift (it seems like the relevant kind, i.e. value drift as part of becoming more rational/intelligent, only exists from irrational perspectives, in a way dependent on the mind architecture)
minimal CEV-realist view. it really seems up to agents how much they care about their reflected preferences. maybe changing preferences too often leads to money pumps, or something?
basically says "there are irrational and rational agents, rationality doesn't apply to irrational agents", seems somewhat how people treat animals (we don't generally consider uplifting normative with respect to animals)
at this point you're at something like ecology / evolutionary game theory, it's a matter of which things tend to survive/reproduce and there aren't general decision theories that succeed

re human ontological crises: basically agree, I think it's reasonably similar to what I wrote. roughly my reason for thinking that it's hard to solve is that the ideal case would be something like a universal algebra homomorphism (where the new ontology actually agrees with the old one but is more detailed), yet historical cases like physics aren't homomorphic to previous ontologies in this way, so there is some warping necessary. you could try putting a metric on the warping and minimizing it, but, well, why would someone think the metric is any good, it seems more of a preference than a thing rationality applies to. if you think about it and come up with a solution, let me know, of course.

with respect to grabbing hold of the whole future: you can try looking at historical cases of people trying to grab hold of the future and seeing how that went, it's a mixed bag with mostly negative reputation, indicating there are downsides as well as upsides, it's not a "safe" conservative view. see also Against Responsibility. I feel like there's a risk of getting Pascal's mugged about "maybe grabbing hold of the future is good, you can't rule it out, so do it", there are downsides to spending effort that way. like, suppose some Communists thought capitalism would lead to the destruction of human value with high enough probability that instituting global communism is the conservative option, it doesn't seem like that worked well (even though a lot of people around here would agree that capitalism tends to leads to human value destruction in the long run). particular opportunities for grabbing hold of the future can be net negative and not worth worrying about even if one of them is a good idea in the long run (I'm not ruling that out, just would have to be convinced of specific opportunities).

overall I'd rather focus on first modeling the likely future and looking for plausible degrees of freedom; a general issue with Pascal's mugging is it might make people overly attached to world models in which they have ~infinite impact (e.g. Christianity, Communism) which means paying too much attention to wrong world models, not updating to more plausible models in which existential-stakes decisions could be comprehended if they exist. and Obliqueness doesn't rule out existential stakes (since it's non-Diagonal).

as another point, Popperian science tends to advance by people making falsifiable claims, "you don't know if that's true" isn't really an objection in that context. the pragmatic claim I would make is: I have some Bayesian reason to believe agents do not in general factor into separate Orthogonal and Diagonal components, this claim is somewhat falsifiable (someone could figure out a theory of this invulnerable to optimization daemons etc), I'm going to spend my attention on the branch where I'm right, I'm not going to worry about Pascal's mugging type considerations for if I'm wrong (as I said, modeling the world first seems like a good general heuristic), people can falsify it eventually if it's false.

this whole discussion is not really a defense of Orthogonality given that Yudkowsky presented orthogonality as a descriptive world model, not a normative claim, so sticking to the descriptive level in the original post seems valid; it would be a form of bad epistemology to reject a descriptive update (assuming the arguments are any good) because of pragmatic considerations.

The Obliqueness Thesis

Jessica Taylor6mo50

"as important as ever": no, because our potential influence is lower, and the influence isn't on things shaped like our values, there has to be a translation, and the translation is different from the original.

CEV: while it addresses "extrapolation" it seems broadly based on assuming the extrapolation is ontologically easy, and "our CEV" is an unproblematic object we can talk about (even though it's not mathematically formalized, any formalization would be subject to doubt, and even if formalized, we need logical uncertainty over it, and logical induction has additional free parameters in the limit). I'm really trying to respond to orthogonality not CEV though.

from a practical perspective: notice that I am not behaving like Eliezer Yudkowsky. I am not saying the Orthogonality Thesis is true and important to ASI, I am instead saying intelligence/values are Oblique and probably nearly Diagonal (though it's unclear what I mean by "nearly"). I am not saying a project of aligning superintelligence with human values is a priority. I am not taking research approaches that assume a Diagonal/Orthogonal factorization. I left MIRI partially because I didn't like their security policies (and because I had longer AI timelines), I thought discussion of abstract research ideas was more important. I am not calling for a global AI shutdown so this project (which is in my view confused) can be completed. I am actually against AI regulation on the margin (I don't have a full argument for this, it's a political matter at this point).

I think practicality looks more like having near-term preferences related to modest intelligence increases (as with current humans vs humans with neural nets; how do neural nets benefit or harm you, practically? how can you use them to think better and improve your life?), and not expecting your preferences to extend into the distant future with many ontology changes, so don't worry about grabbing hold of the whole future etc, think about how to reduce value drift while accepting intelligence increases on the margin. This is a bit like CEV except CEV is in a thought experiment instead of reality.

The "Models of ASI should start with realism" bit IS about practicalities, namely, I think focusing on first forecasting absent a strategy of what to do about the future is practical with respect to any possible influence on the far future; practically, I think your attempted jump to practicality (which might be related to philosophical pragmatism) is impractical in this context.

It occurs to me that maybe you mean something like "Our current (non-extrapolated) values are our real values, and maybe it's impossible to build or become a superintelligence that shares our real values so we'll have to choose between alignment and superintelligence." Is this close to your position?

Close. Alignment of already-existing human values with superintelligence is impossible (I think) because of the arguments given. That doesn't mean humans have no preferences indirectly relating to superintelligence (especially, we have preferences about modest intelligence increases, and there's some iterative process).

UDT shows that decision theory is more puzzling than ever

Jessica Taylor10mo30

Yes I still endorse the post. Some other posts:

Two alternatives to logical counterfactuals (note: I think policy dependent source code works less well than I thought it did at the time of writing)

A critical agential account... (general framework, somewhat underspecified or problematic in places but leads to more specific things like the linear logic post; has similarities to constructor theory)

Dequantifying first-order theories

Jessica Taylor1y10

The axioms of U are recursively enumerable. You run all M(i,j) in parallel and output a new axiom whenever one halts. That's enough to computably check a proof if the proof specifies the indices of all axioms used in the recursive enumeration.

Dequantifying first-order theories

Jessica Taylor1y10

Thanks, didn't know about the low basis theorem.

Dequantifying first-order theories

Jessica Taylor1y*10

U axiomatizes a consistent guessing oracle producing a model of T. There is no consistent guessing oracle applied to U.

In the previous post I showed that a consistent guessing oracle can produce a model of T. What I show in this post is that the theory of this oracle can be embedded in propositional logic so as to enable provability preserving translations.

Dequantifying first-order theories

Jessica Taylor1y10

LS shows to be impossible one type of infinitarian reference, namely to uncountably infinite sets. I am interested in showing to be impossible a different kind of infinitarian reference. "Impossible" and "reference" are, of course, interpreted differently by different people.

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments