The subject of this post appears in the "Did you know..." section of Wikipedia's front page(archived) right now.
I'm saying "transformers" every time I am tempted to write "LLMs" because many modern LLMs also do image processing, so the term "LLM" is not quite right.
"Transformer"'s not quite right either because you can train a transformer on a narrow task. How about foundation model: "models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks".
I agree 100%. It would be interesting to explore how the term "AGI" has evolved, maybe starting with Goertzel and Pennachin 2007 who define it as:
a software program that can solve a variety of complex problems in a variety of different domains, and that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions
On the other hand, Stuart Russell testified that AGI means
machines that match or exceed human capabilities in every relevant dimension
so the experts seem to disagree. (On the other hand, ...
I wonder why Gemini used RLHF instead of Direct Preference Optimization (DPO). DPO was written up 6 months ago; it's simpler and apparently more compute-efficient than RLHF.
Thanks! For convex sets of distributions: If you weaken the definition of fixed point to , then the set has a least element which really is a least fixed point.
CFAR used to have an awesome class called "Be specific!" that was mostly about concreteness. Exercises included:
Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.
No, I just took a look. The spin glass stuff looks interesting!
I think you're saying , right? In that case, since embeds into , we'd have embedding into . So not really a step up.
If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.
I suppose you could have a hybrid approach: Order is allowed to be discontinuous in its order- beliefs, but higher orders have to be continuous? Maybe that would get you to ....
I apologize, I shouldn't have leapt to that conclusion.
Apology accepted.
it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.
By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.
This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.
The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets
No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere".
Rounding off the slow takeoff hypothesis to "lots and lots of little innovations addin...
"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.
I read "Takeoff Speeds" at the time. I did not liveblog my reaction to it at the time. I've read the first two other items.
I flag your weirdly uncharitable inference.
I feel excited about this framework! Several thoughts:
I especially like the metathreat hierarchy. It makes sense because if you completely curry it, each agent sees the foe's action, policy, metapolicy, etc., which are all generically independent pieces of information. But it gets weird when an agent sees an action that's not compatible with the foe's policy.
You hinted briefly at using hemicontinuous maps of sets instead of or in addition to probability distributions, and I think that's a big part of what makes this framework exciting. Maybe if one takes a...
Or maybe it means we train the professional in the principles and heuristics that the bot knows. The question is if we can compress the bot's knowledge into, say, a 1-year training program for professionals.
There are reasons to be optimistic: We can discard information that isn't knowledge (lossy compression). And we can teach the professional in human concepts (lossless compression).
This sounds like a great goal, if you mean "know" in a lazy sense; I'm imagining a question-answering system that will correctly explain any game, move, position, or principle as the bot understands it. I don't believe I could know all at once everything that a good bot knows about go. That's too much knowledge.
Red-penning is a general problem-solving method that's kinda similar to this research methodology.
I'd believe the claim if I thought that alignment was easy enough that AI products that pass internal product review and which don't immediately trigger lawsuits would be aligned enough to not end the world through alignment failure. But I don't think that's the case, unfortunately.
It seems like we'll have to put special effort into both single/single alignment and multi/single "alignment", because the free market might not give it to us.
I'd like more discussion of the claim that alignment research is unhelpful-at-best for existential safety because of it accelerating deployment. It seems to me that alignment research has a couple paths to positive impact which might balance the risk:
Tech companies will be incentivized to deploy AI with slipshod alignment, which might then take actions that no one wants and which pose existential risk. (Concretely, I'm thinking of out with a whimper and out with a bang scenarios.) But the existence of better alignment techniques might legitimize governa
I'd believe the claim if I thought that alignment was easy enough that AI products that pass internal product review and which don't immediately trigger lawsuits would be aligned enough to not end the world through alignment failure. But I don't think that's the case, unfortunately.
It seems like we'll have to put special effort into both single/single alignment and multi/single "alignment", because the free market might not give it to us.
In this case humans are doing the job of transferring from to , and the training algorithm just has to generalize from a representative sample of to the test set.
Thanks for the references! I now know that I'm interested specifically in cooperative game theory, and I see that Shoham & Leyton-Brown has a chapter on "coalitional game theory", so I'll take a look.
A proof of the lemma :
Ah, ok. When you said "obedience" I imagined too little agency — an agent that wouldn't stop to ask clarifying questions. But I think we're on the same page regarding the flavor of the objective.
Might not intent alignment (doing what a human wants it to do, being helpful) be a better target than obedience (doing what a human told it to do)?
My takeaway from this is that if we're doing policy selection in an environment that contains predictors, instead of applying the counterfactual belief that the predictor is always right, we can assume that we get rewarded if the predictor is wrong, and then take maximin.
How would you handle Agent Simulates Predictor? Is that what TRL is for?
The observation can provide all sorts of information about the universe, including whether exploration occurs. The exact set of possible observations depends on the decision problem.
and can have any relationship, but the most interesting case is when one can infer from with certainty.
Thanks, I made this change to the post.
Yeah, I think the fact that Elo only models the macrostate makes this an imperfect analogy. I think a better analogy would involve a hybrid model, which assigns a probability to a chess game based on whether each move is plausible (using a policy network), and whether the higher-rated player won.
I don't think the distinction between near-exact and nonexact models is essential here. I bet we could introduce extra entropy into the short-term gas model and the rollout would still be superior for predicting the microstate than the Boltzmann distribution.
The sum isn't over , though, it's over all possible tuples of length . Any ideas for how to make that more clear?
I'm having trouble following this step of the proof of Theorem 4: "Obviously, the first conditional probability is 1". Since the COD isn't necessarily reflective, couldn't the conditional be anything?
The linchpin discovery is probably February 2016.
Ok. I think that's the way I should have written it, then.
Oh, interesting. Would your interpretation be different if the guess occurred well after the coinflip (but before we get to see the coinflip)?
What predictions can we get out of this model? If humans use counterfactual reasoning to initialize MCMC, does that imply that humans' implicit world models don't match their explicit counterfactual reasoning?
I agree exploration is a hack. I think exploration vs. other sources of non-dogmatism is orthogonal to the question of counterfactuals, so I'm happy to rely on exploration for now.
"Programmatically Interpretable Reinforcement Learning" (Verma et al.) seems related. It would be great to see modular, understandable glosses of neural networks.
This doesn't quite work. The theorem and examples only work if you maximize the unconditional mutual information, , not . And the choice of is doing a lot of work — it's not enough to make it "sufficiently rich".
On 2018-04-09, OpenAI said[1]:
In contrast, in 2023, OpenAI said[2]:
Archived ↩︎
This archived snapshot is from 2023-05-17, but the document didn't get much attention until November that year. ↩︎