As once discussed in person, I find this proposal pretty interesting and I think it deserves further thought.
Like some other commenters, I think for many tasks it's probably not tractable to develop a fully interpretable, competitive GOFAI program. For example, I would imagine that for playing chess well, one needs to do things like positively evaluating some random-looking feature of a position just on the basis that empirically this feature is associated with higher win rate. However, the approach of the post could be weakened to allow "mixed" programs t...
After having spent a few hours playing with Opus, I think "slightly better than best public gpt-4" seems qualitatively correct -- both models tend to get tripped up on the same kinds of tasks, but Opus can inconsistently solve some tasks in my workflow that gpt-4 cannot.
And yeah, it seems likely that I will also swap to Claude over ChatGPT.
This means that the model can and will implicitly sacrifice next-token prediction accuracy for long horizon prediction accuracy.
Are you claiming this would happen even given infinite capacity?
I think that janus isn't claiming this and I also think it isn't true. I think it's all about capacity constraints. The claim as I understand it is that there are some intermediate computations that are optimized both for predicting the next token and for predicting the 20th token and that therefore have to prioritize between these different predictions.
Here's a simple toy model that illustrates the difference between 2 and 3 (that doesn't talk about attention layers, etc.).
Say you have a bunch of triplets . Your want to train a model that predicts from and from .
Your model consists of three components: . It makes predictions as follows:
(Why have such a model? Why not have two completely separate models, one for predicting and one for predicting ? Because it might be more efficient to use ...
Very interesting post! Unfortunately, I found this a bit hard to understand because the linked papers don’t talk about EDT versus CDT or scenarios where these two come apart and because both papers are (at least in part) about sequential decision problems, which complicates things. (CDT versus EDT can mostly be considered in the case of a single decision and there are various complications in multi-decision scenarios, like updatelessness.)
Here’s an attempt at trying to describe the relation of the two papers to CDT and EDT, including prior work on these to...
>I'm not sure I understand the variant you proposed. How is that different than the Othman and Sandholm MAX rule?
Sorry if I was cryptic! Yes, it's basically the same as using the MAX decision rule and (importantly) a quasi-strictly proper scoring rule (in their terminology, which is basically the same up to notation as a strictly proper decision scoring rule in the terminology of the decision scoring rules paper). (We changed the terminology for our paper because "quasi-strictly proper scoring rule w.r.t. the max decision rule" is a mouthful. :-P) Does ...
>the biggest distinction is that this post's proposal does not require specifying the decision maker's utility function in order to reward one of the predictors and shape their behavior into maximizing it.
Hmm... Johannes made a similar argument in personal conversation yesterday. I'm not sure how convinced I am by this argument.
So first, here's one variant of the proper decision scoring rules setup where we also don't need to specify the decision maker's utility function: Ask the predictor for her full conditional probability distribution for each actio...
The following is based on an in-person discussion with Johannes Treutlein (the second author of the OP).
>But is there some concrete advantage of zero-sum conditional prediction over the above method?
So, here's a very concrete and clear (though perhaps not very important) advantage of the proposed method over the method I proposed. The method I proposed only works if you want to maximize expected utility relative to the predictor's beliefs. The zero-sum competition model enables optimal choice under a much broader set of possible preferences over outcome...
Nice post!
Miscellaneous comments and questions, some of which I made on earlier versions of this post. Many of these are bibliographic, relating the post in more detail to prior work, or alternative approaches.
In my view, the proposal is basically to use a futarchy / conditional prediction market design like that the one proposed by Hanson, with I think two important details:
- The markets aren't subsidized. This ensures that the game is zero-sum for the predictors -- they don't prefer one action to be taken over another. In the scoring rules setting, subsi...
Minor bibliographical note: A related academic paper is Arif Ahmed's unpublished paper, "Sequential Choice and the Agent's Perspective". (This is from memory -- I read that paper a few years ago.)
Nice post!
What would happen in your GPT-N fusion reactor story if you ask it a broader question about whether it is a good idea to share the plans?
Perhaps relatedly:
>Ok, but can’t we have an AI tell us what questions we need to ask? That’s trainable, right? And we can apply the iterative design loop to make AIs suggest better questions?
I don't get what your response to this is. Of course, there is the verifiability issue (which I buy). But it seems that the verifiability issue alone is sufficient for failure. If you ask, "Can this design be turned...
Sounds interesting! Are you going to post the reading list somewhere once it is completed?
(Sorry for self-promotion in the below!)
I have a mechanism design paper that might be of interest: Caspar Oesterheld and Vincent Conitzer: Decision Scoring Rules. WINE 2020. Extended version. Talk at CMID.
Here's a pitch in the language of incentivizing AI systems -- the paper is written in CS-econ style. Imagine you have an AI system that does two things at the same time:
1) It makes predictions about the world.
2) It takes actions that influence the world. (In the pape...
Cool that this is (hopefully) being done! I have had this on my reading list for a while and since this is about the kind of problems I also spend much time thinking about, I definitely have to understand it better at some point. I guess I can snooze it for a bit now. :P Some suggestions:
Maybe someone could write an FAQ page? Also, a somewhat generic idea is to write something that is more example based, perhaps even something that just solely gives examples. Part of why I suggest these two is that I think they can be written relatively mechanically and th...
Not very important, but: Despite having spent a lot of time on formalizing SPIs, I have some sympathy for a view like the following:
> Yeah, surrogate goals / SPIs are great. But if we want AI to implement them, we should mainly work on solving foundational issues in decision and game theory with an aim toward AI. If we do this, then AI will implement SPIs (or something even better) regardless of how well we understand them. And if we don't solve these issues, then it's hopeless to add SPIs manually. Furthermore, believing that surrogate goals / SPIs wor...
Great to see more work on surrogate goals/SPIs!
>Personally, the author believes that SPI might “add up to normality” --- that it will be a sort of reformulation of existing (informal) approaches used by humans, with similar benefits and limitations.
I'm a bit confused by this claim. To me it's a bit unclear what you mean by "adding up to normality". (E.g.: Are you claiming that A) humans in current-day strategic interactions shouldn't change their behavior in response to learning about SPIs (because 1) they are already using them or 2) doing things that ...
Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory -- for example, you could have the decision procedure: "follow CDT, but when indifferent between multiple actions, choose a dis...
Sorry for taking an eternity to reply (again).
On the first point: Good point! I've now finally fixed the SSA probabilities so that they sum up to 1, which really they should, to really have a version of EDT.
>prevents coordination between agents making different observations.
Yeah, coordination between different observations is definitely not optimal in this case. But I don't see an EDT way of doing it well. After all, there are cases where given one observation, you prefer one policy and given another observation you favor another policy. So I ...
>Caspar Oesterheld and Vince Conitzer are also doing something like this
That paper can be found at https://users.cs.duke.edu/~ocaspar/CDTMoneyPump.pdf . And yes, it is structurally essentially the same as the problem in the post.
Not super important but maybe worth mentioning in the context of generalizing Pavlov: the strategy Pavlov for the iterated PD can be seen as an extremely shortsighted version of the law of effect, which basically says: repeat actions that have worked well in the past (in similar situations). Of course, the LoE can be applied in a wide range of settings. For example, in their reinforcement learning textbook, Sutton and Barto write that LoE underlies all of (model-free) RL.
> I tried to understand Caspar’s EDT+SSA but was unable to figure it out. Can someone show how to apply it to an example like the AMD to help illustrate it?
Sorry about that! I'll try to explain it some more. Let's take the original AMD. Here, the agent only faces a single type of choice -- whether to EXIT or CONTINUE. Hence, in place of a policy we can just condition on when computing our SSA probabilities. Now, when using EDT+SSA, we assign probabilities to being a specific instance in a specific possible history of the world. For example, ...
My paper "Robust program equilibrium" (published in Theory and Decision) discusses essentially NicerBot (under the name ϵGroundedFairBot) and mentions Jessica's comment in footnote 3. More generally, the paper takes strategies from iterated games and transfers them into programs for the corresponding program game. As one example, tit for tat in the iterated prisoner's dilemma gives rise to NicerBot in the "open-source prisoner's dilemma".
Since Briggs [1] shows that EDT+SSA and CDT+SIA are both ex-ante-optimal policies in some class of cases, one might wonder whether the result of this post transfers to EDT+SSA. I.e., in memoryless POMDPs, is every (ex ante) optimal policy also consistent with EDT+SSA in a similar sense. I think it is, as I will try to show below.
Given some existing policy , EDT+SSA recommends that upon receiving observation we should choose an action from (For notational simplicity, I'll assume that poli...
Caveat: The version of EDT provided above only takes dependences between instances of EDT making the same observation into account. Other dependences are possible because different decision situations may be completely "isomorphic"/symmetric even if the observations are different. It turns out that the result is not valid once one takes such dependences into account, as shown by Conitzer [2]. I propose a possible solution in https://casparoesterheld.com/2017/10/22/a-behaviorist-approach-to-building-phenomenological-bridges/ . Roughly speaking, my solution
...Caveat: The version of EDT provided above only takes dependences between instances of EDT making the same observation into account. Other dependences are possible because different decision situations may be completely "isomorphic"/symmetric even if the observations are different. It turns out that the result is not valid once one takes such dependences into account, as shown by Conitzer [2]. I propose a possible solution in https://casparoesterheld.com/2017/10/22/a-behaviorist-approach-to-building-phenomenological-bridges/ . Roughly speaking, my solution
...
Yeah, I think I agree with this and in general with what you say in this paragraph. Along the lines of your footnote, I'm still not quite sure what exactly "X can be understood" must require. It seems to matter, for example, that to a human it's understandable how the given rule/heuristic or something like the given heuristic could be useful. At least if we specifically think about AI risk, all we really need is that X is interpretable enough that we can tell that it's not doing anything problematic (?).