Cinera Verinia1y10

i.e. if each forecaster $w \in W$ has an first-order belief $f (w) \in B (S)$ , and $w \in B (S)$ is your second-order belief about which forecaster is correct, then $(w ⊳_{W S} f) \in B (S)$ should be your first-order belief about the election.

I think there might be a typo here. Did you instead mean to write: " $w \in B (W)$ " for the second order beliefs about the forecasters?

ARC's first technical report: Eliciting Latent Knowledge

Cinera Verinia2y10

We aren’t offering these criteria as necessary for “knowledge”—we could imagine a breaker proposing a counterexample where all of these properties are satisfied but where intuitively M didn’t really know that A′ was a better answer. In that case the builder will try to make a convincing argument to that effect.

Bolded should be sufficient.

In Defense of Wrapper-Minds

Cinera Verinia2y22

In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time, we operate on autopilot.

Yeah, I agree with this. But I don't think the human system aggregates into any kind of coherent total optimiser. Humans don't have an objective function (not even approximately?).

A human is not well modelled as a wrapper mind; do you disagree?

1Thane Ruthenis2y

Certainly agree. That said, I feel the need to lay out my broader model here. The way I see it, a "wrapper-mind" is a general-purpose problem-solving algorithm hooked up to a static value function. As such: * Are humans proper wrapper-minds? No, certainly not. * Do humans have the fundamental machinery to be wrapper-minds? Yes. * Is any individual run of a human general-purpose problem-solving algorithm essentially equivalent to wrapper-mind-style reasoning? Yes. * Can humans choose to act as wrapper-minds on longer time scales? Yes, approximately, subject to constraints like force of will. * Do most humans, in practice, choose to act as wrapper-minds? No, we switch our targets all the time, value drift is ubiquitous. * Is it desirable for a human to act as a wrapper-mind? That's complicated. * On the one hand, yes because consistent pursuit of instrumentally convergent goals would lead to you having more resources to spend on whatever values you have. * On the other hand, no because we terminally value this sort of value-drift and self-inconsistency, it's part of "being human". * In sum, for humans, there's a sort of tradeoff between approximating a wrapper-mind, and being an incoherent human, and different people weight it differently in different context. E. g., if you really want to achieve something (earning your first million dollars, averting extinction), and you value it more than having fun being a human, you may choose to act as a wrapper-mind in the relevant context/at the relevant scale. As such: humans aren't wrapper-minds, but they can act like them, and it's sometimes useful to act as one.

In Defense of Wrapper-Minds

Cinera Verinia2y30

Thus, any greedy optimization algorithm would convergently shape its agent to not only pursue $R$ , but to maximize for $R$ 's pursuit — at the expense of everything else.

Conditional on:

Such a system being reachable/accessible to our local/greedy optimisation process
Such a system being actually performant according to the selection metric of our optimisation process

I'm pretty sceptical of #2. I'm sceptical that systems that perform inference via direct optimisation over their outputs are competitive in rich/complex environments.

Such o... (read more)

2Thane Ruthenis2y

It's not a binary. You can perform explicit optimization over high-level plan features, then hand off detailed execution to learned heuristics. "Make coffee" may be part of an optimized stratagem computed via consequentialism, but you don't have to consciously optimize every single muscle movement once you've decided on that goal. Essentially, what counts as "outputs" or "direct actions" relative to the consequentialist-planner is flexible, and every sufficiently-reliable (chain of) learned heuristics can be put in that category, with choosing to execute one of them available to the planner algorithm as a basic output. In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time, we operate on autopilot.

DragonGod's Shortform

Cinera Verinia2y*50

Some Nuance on Learned Optimisation in the Real World

I think mesa-optimisers should not be thought of as learned optimisers, but systems that employ optimisation/search as part of their inference process.

The simplest case is that pure optimisation during inference is computationally intractable in rich environments (e.g. the real world), so systems (e.g. humans) operating in the real world, do not perform inference solely by directly optimising over outputs.

Rather optimisation is employed sometimes as one part of their inference strategy. That is systems o... (read more)

GPTs are Predictors, not Imitators

Cinera Verinia2y2551

GPTs are not Imitators, nor Simulators, but Predictors.

I think an issue is that GPT is used to mean two things:

A predictive model whose output is a probability distribution over token space given its prompt and context
Any particular techniques/strategies for sampling from the predictive model to generate responses/completions for a given prompt.

[See the Appendix]

The latter kind of GPT, is what I think is rightly called a "Simulator".

From @janus' Simulators (italicised by me):

I use the generic term “simulator” to refer to models trained with pre

... (read more)

janus2y1119

Predictors are (with a sampling loop) simulators! That's the secret of mind

Why I'm joining Anthropic

Cinera Verinia2y*10

What do you think MIRI is currently doing wrong/what should they change about their approach/general strategy?

Evan Hubinger2y*117

I thought I was pretty clear in the post that I don't have anything against MIRI. I guess if I were to provide feedback, the one thing I most wish MIRI would do more is hire additional researchers—I think MIRI currently has too high of a hiring bar.

[Simulators seminar sequence] #1 Background & shared assumptions

Cinera Verinia2y1-8

To be clear, I enjoyed the post and am looking forward to this sequence. A point of disagreement though:

One feasible-seeming approach is "accelerating alignment," which involves leveraging AI as it is developed to help solve the challenging problems of alignment. This is not a novel idea, as it's related to previously suggested concepts such as seed AI, nanny AI, and iterated amplification and distillation (IDA).

I disagree that using AI to accelerate alignment research is particularly load bearing for the development of a practical alignment craf... (read more)

janus2y55

I won't write a detailed object-level response to this for now, since we're probably going to publish a lot about it soon. I'll just say that my/our experience with the usefulness of GPT has been very different than yours -

I have used ChatGPT to aid some of my writing and plan to use it more — but it's to the same extent that we use Google/Wikipedia/Word processors to do research in general.

I've used GPT-3 extensively, and for me it has been transformative. To the extent that my work has been helpful to you, you're indebted to GPT-3 as well, because "janus... (read more)

Realism about rationality

Cinera Verinia7y20

I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.

Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computati

... (read more)

Vanessa Kosoy7y40

Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by "physics" you, by definition, refer to the territory, then it seems to miss my point about Occam's razor. Occam's razor says that the map should be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff ... (read more)

AI ALIGNMENT FORUM
AF

All of DragonGod's Comments + Replies

Some Nuance on Learned Optimisation in the Real World