All of Nisan's Comments + Replies

Nisan150

On 2018-04-09, OpenAI said[1]:

OpenAI’s mission is to ensure that artificial general intelligence (AGI) [...] benefits all of humanity.

In contrast, in 2023, OpenAI said[2]:

[...] OpenAI’s mission: to build artificial general intelligence (AGI) that is safe and benefits all of humanity.


  1. Archived ↩︎

  2. This archived snapshot is from 2023-05-17, but the document didn't get much attention until November that year. ↩︎

Nisan10

The subject of this post appears in the "Did you know..." section of Wikipedia's front page(archived) right now.

Nisan1214

I'm saying "transformers" every time I am tempted to write "LLMs" because many modern LLMs also do image processing, so the term "LLM" is not quite right.

"Transformer"'s not quite right either because you can train a transformer on a narrow task. How about foundation model: "models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks".

Nisan97

I agree 100%. It would be interesting to explore how the term "AGI" has evolved, maybe starting with Goertzel and Pennachin 2007 who define it as:

a software program that can solve a variety of complex problems in a variety of different domains, and that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions

On the other hand, Stuart Russell testified that AGI means

machines that match or exceed human capabilities in every relevant dimension

so the experts seem to disagree. (On the other hand, ... (read more)

Nisan53

I wonder why Gemini used RLHF instead of Direct Preference Optimization (DPO). DPO was written up 6 months ago; it's simpler and apparently more compute-efficient than RLHF.

  • Is the Gemini org structure so sclerotic that it couldn't switch to a more efficient training algorithm partway through a project?
  • Is DPO inferior to RLHF in some way? Lower quality, less efficient, more sensitive to hyperparameters?
  • Maybe they did use DPO, even though they claimed it was RLHF in their technical report?
Nisan10

Thanks! For convex sets of distributions: If you weaken the definition of fixed point to , then the set has a least element which really is a least fixed point.

Nisan30

Conception is a startup trying to do in vitro gametogenesis for humans!

Nisan90

CFAR used to have an awesome class called "Be specific!" that was mostly about concreteness. Exercises included:

  • Rationalist taboo
  • A group version of rationalist taboo where an instructor holds an everyday object and asks the class to describe it in concrete terms.
  • The Monday-Tuesday game
  • A role-playing game where the instructor plays a management consultant whose advice is impressive-sounding but contentless bullshit, and where the class has to force the consultant to be specific and concrete enough to be either wrong or trivial.
  • People were encouraged t
... (read more)
Nisan10

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

Nisan10

No, I just took a look. The spin glass stuff looks interesting!

1romeostevensit
Are we talking about the same thing? https://www.sciencedirect.com/science/article/am/pii/S0370157317301424
Nisan10

I think you're saying , right? In that case, since embeds into , we'd have embedding into . So not really a step up.

If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.

I suppose you could have a hybrid approach: Order is allowed to be discontinuous in its order- beliefs, but higher orders have to be continuous? Maybe that would get you to .... (read more)

Nisan130

I apologize, I shouldn't have leapt to that conclusion.

Apology accepted.

Nisan60

it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.

This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.

Nisan130

The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets

No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere".

Rounding off the slow takeoff hypothesis to "lots and lots of little innovations addin... (read more)

Nisan10

"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.

... (read more)
[This comment is no longer endorsed by its author]Reply

I read "Takeoff Speeds" at the time.  I did not liveblog my reaction to it at the time.  I've read the first two other items.

I flag your weirdly uncharitable inference.

Nisan30

I feel excited about this framework! Several thoughts:

I especially like the metathreat hierarchy. It makes sense because if you completely curry it, each agent sees the foe's action, policy, metapolicy, etc., which are all generically independent pieces of information. But it gets weird when an agent sees an action that's not compatible with the foe's policy.

You hinted briefly at using hemicontinuous maps of sets instead of or in addition to probability distributions, and I think that's a big part of what makes this framework exciting. Maybe if one takes a... (read more)

Nisan10

See also this comment from 2013 that has the computable version of NicerBot.

Nisan20

Or maybe it means we train the professional in the principles and heuristics that the bot knows. The question is if we can compress the bot's knowledge into, say, a 1-year training program for professionals.

There are reasons to be optimistic: We can discard information that isn't knowledge (lossy compression). And we can teach the professional in human concepts (lossless compression).

Nisan60

This sounds like a great goal, if you mean "know" in a lazy sense; I'm imagining a question-answering system that will correctly explain any game, move, position, or principle as the bot understands it. I don't believe I could know all at once everything that a good bot knows about go. That's too much knowledge.

2Adam Shimi
That's basically what Paul's universality (my distillation post for another angle) is aiming for: having a question-answering overseer which can tell you everything you want to know about what the system knows and what it will do. You still probably need to be able to ask a relevant question, which I think is what you're pointing at.
1DanielFilan
Maybe it nearly suffices to get a go professional to know everything about go that the bot does? I bet they could.
1DanielFilan
Good point!
Nisan40

Red-penning is a general problem-solving method that's kinda similar to this research methodology.

5Rohin Shah
These are both cases of counterexample-guided techniques. The basic idea is to solve "exists x: forall y: P(x, y)" statements according to the following algorithm: 1. Choose some initial x, and initialize a set Y = {}. 2. Solve "exists y: not P(x, y)". If unsolvable, you're done. If not, take the discovered y and put it in Y. 3. Solve "exists x: forall y in Y: P(x, y)" and set the solution as your new x. 4. Go to step 2. The reason this is so nice is because you've taken a claim with two quantifiers and written an algorithm that must only ever solve claims with one quantifier. (For step 3, you inline the "forall y in Y" part, because Y is a small finite set.) The methodology laid out in this post is a counterexample-guided approach to solve the claim "exists alignment proposal: forall plausible worlds: The alignment proposal is safe in the world" Examples from programming languages include CEGIS (counterexample guided inductive synthesis) and CEGAR (counterexample guided abstraction refinement).
Nisan70

I'd believe the claim if I thought that alignment was easy enough that AI products that pass internal product review and which don't immediately trigger lawsuits would be aligned enough to not end the world through alignment failure. But I don't think that's the case, unfortunately.

It seems like we'll have to put special effort into both single/single alignment and multi/single "alignment", because the free market might not give it to us.

Nisan110

I'd like more discussion of the claim that alignment research is unhelpful-at-best for existential safety because of it accelerating deployment. It seems to me that alignment research has a couple paths to positive impact which might balance the risk:

  1. Tech companies will be incentivized to deploy AI with slipshod alignment, which might then take actions that no one wants and which pose existential risk. (Concretely, I'm thinking of out with a whimper and out with a bang scenarios.) But the existence of better alignment techniques might legitimize governa

... (read more)
Nisan70

I'd believe the claim if I thought that alignment was easy enough that AI products that pass internal product review and which don't immediately trigger lawsuits would be aligned enough to not end the world through alignment failure. But I don't think that's the case, unfortunately.

It seems like we'll have to put special effort into both single/single alignment and multi/single "alignment", because the free market might not give it to us.

Nisan30

In this case humans are doing the job of transferring from to , and the training algorithm just has to generalize from a representative sample of to the test set.

2ESRogs
Thank you, this was helpful. I hadn't understood what was meant by "the generalization is now coming entirely from human beliefs", but now it seems clear. (And in retrospect obvious if I'd just read/thought more carefully.)
Nisan10

Thanks for the references! I now know that I'm interested specifically in cooperative game theory, and I see that Shoham & Leyton-Brown has a chapter on "coalitional game theory", so I'll take a look.

1Vojtech Kovarik
Related to that: An interesting take (not only) on cooperative game theory is Schelling's The Strategy of Conflict (from 1960, resp. second edition from 1980, but I am not aware of sufficient follow-up research on the ideas presented there). And there might be some useful references in CLR's sequence on Cooperation, Conflict, and Transformative AI.
Nisan10

A proof of the lemma :

Nisan10

Ah, ok. When you said "obedience" I imagined too little agency — an agent that wouldn't stop to ask clarifying questions. But I think we're on the same page regarding the flavor of the objective.

Nisan20

Might not intent alignment (doing what a human wants it to do, being helpful) be a better target than obedience (doing what a human told it to do)?

2Richard Ngo
I should clarify that when I think about obedience, I'm thinking obedience to the spirit of an instruction, not just the wording of it. Given this, the two seem fairly similar, and I'm open to arguments about whether it's better to talk in terms of one or the other. I guess I favour "obedience" because it has fewer connotations of agency - if you're "doing what a human wants you to do", then you might run off and do things before receiving any instructions. (Also because it's shorter and pithier - "the goal of doing what humans want" is a bit of a mouthful).
Nisan30

My takeaway from this is that if we're doing policy selection in an environment that contains predictors, instead of applying the counterfactual belief that the predictor is always right, we can assume that we get rewarded if the predictor is wrong, and then take maximin.

How would you handle Agent Simulates Predictor? Is that what TRL is for?

2Vanessa Kosoy
That's about right. The key point is, "applying the counterfactual belief that the predictor is always right" is not really well-defined (that's why people have been struggling with TDT/UDT/FDT for so long) while the thing I'm doing is perfectly well-defined. I describe agents that are able to learn which predictors exist in their environment and respond rationally ("rationally" according to the FDT philosophy). TRL is for many things to do with rational use of computational resources, such as (i) doing multi-level modelling in order to make optimal use of "thinking time" and "interacting with environment time" (i.e. simultaneously optimize sample and computational complexity) (ii) recursive self-improvement (iii) defending from non-Cartesian daemons (iv) preventing thought crimes. But, yes, it also provides a solution to ASP. TRL agents can learn whether it's better to be predictable or predicting.
Nisan10

The observation can provide all sorts of information about the universe, including whether exploration occurs. The exact set of possible observations depends on the decision problem.

and can have any relationship, but the most interesting case is when one can infer from with certainty.

Nisan10

Thanks, I made this change to the post.

Nisan20

Yeah, I think the fact that Elo only models the macrostate makes this an imperfect analogy. I think a better analogy would involve a hybrid model, which assigns a probability to a chess game based on whether each move is plausible (using a policy network), and whether the higher-rated player won.

I don't think the distinction between near-exact and nonexact models is essential here. I bet we could introduce extra entropy into the short-term gas model and the rollout would still be superior for predicting the microstate than the Boltzmann distribution.

Nisan10

The sum isn't over , though, it's over all possible tuples of length . Any ideas for how to make that more clear?

2Rohin Shah
I find the current notation fine, but if you want to make it more explicit, you could do ∑xk+2∑xk+3⋯∑xk+nPn(…)
2Diffractor
My initial inclination is to introduce Xn as the space of events on turn n, and define Xa:b:=b∏i=aXi and then you can express it as ∑σ∈Xk+2:k+nPn(xk+1,σ|x0...xk) .
Nisan20

I'm having trouble following this step of the proof of Theorem 4: "Obviously, the first conditional probability is 1". Since the COD isn't necessarily reflective, couldn't the conditional be anything?

1Jessica Taylor
By definition UO()=FiveTenO(⌈COEDTO(⌈U⌉)⌉)=COEDTO(⌈U⌉), regardless of O. (The subscript Qji to P only affects the distribution of O) EDIT: clarified notation in the post
Nisan30

The linchpin discovery is probably February 2016.

1Scott Garrabrant
fixed
Nisan20

Ok. I think that's the way I should have written it, then.

Nisan20

Oh, interesting. Would your interpretation be different if the guess occurred well after the coinflip (but before we get to see the coinflip)?

2David Simmons
Sure, in that case there is a 0% counterfactual chance of heads, your words aren't going to flip the coin.
Nisan20

What predictions can we get out of this model? If humans use counterfactual reasoning to initialize MCMC, does that imply that humans' implicit world models don't match their explicit counterfactual reasoning?

Nisan10

I agree exploration is a hack. I think exploration vs. other sources of non-dogmatism is orthogonal to the question of counterfactuals, so I'm happy to rely on exploration for now.

Nisan30

"Programmatically Interpretable Reinforcement Learning" (Verma et al.) seems related. It would be great to see modular, understandable glosses of neural networks.

Nisan10

This doesn't quite work. The theorem and examples only work if you maximize the unconditional mutual information, , not . And the choice of is doing a lot of work — it's not enough to make it "sufficiently rich".