User Comment Replies — AI Alignment Forum

6mo10

It seems potentially important to compare this to GPT4o. In my experience when asking GPT4 for research papers on particular subjects it seemed to make up non-existent research papers (at least I didn't find them after multiple minutes of searching the web). I don't have any precise statistics on this.

2Abram Demski6mo

What I'm trying to express here is that it is surprising that o1 seems to explicitly encourage itself to fake links; not just that it fakes links. I agree that other models often hallucinate plausible references. What I haven't seen before is a chain of thought which encourages this. Furthermore, while it's plausible that you can solicit such a chain of thought from 4o under some circumstances, it seems a priori surprising that such behavior would survive at such a high rate in a model whose chain of thought has specifically been trained via rl to help produce correct answers. This leads me to guess the rl is badly misaligned.

johnswentworth's Shortform

Johannes C. Mayer

2y42

Expected Utility Maximization is Not Enough

Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let's take this as a reasonable assumption).

No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not alig... (read more)

Johannes C. Mayer's Shortform

Johannes C. Mayer

2y10

Right now I am trying to better understand future AI systems, by first thinking about what sort of abilities I expect every system of high cognitive power will have, and second, trying to find a concrete practical implementation of this ability. One ability is building a model of the world, that has certain desiderata. For example, if we have multiple agents in the world, then we can factor the world, such that we can build just one model of the agent, and point to this model in our description of the world two times. This is something that Solom... (read more)

Johannes C. Mayer's Shortform

Johannes C. Mayer

2y*10

Many people match "pivotal act" to "deploy AGI to take over the world", and ignore the underlying problem of preventing others from deploying misaligned AGI.

I have talked to two high-profile alignment/alignment-adjacent people who actively dislike pivotal acts.

I think both have contorted notions of what a pivotal act is about. They focused on how dangerous it would be to let a powerful AI system loose on the world.

However, a pivotal act is about this. So an act that ensures that misaligned AGI will not be built is a pivotal act. Many such acts might look l... (read more)

Johannes C. Mayer's Shortform

Johannes C. Mayer

2y10

Solomonoff induction does not talk about how to make optimal tradeoffs in the programs that serve as the hypothesis.

Imagine you want to describe a part of the world that contains a gun. Solomonoff induction would converge on finding the program that perfectly predicts all the possible observations. So this program would be able to predict what sort of observations would I make after I stuff a banana into the muzzle and fire it. But knowing how the banana was splattered around is not the most useful fact about the gun. It is more useful to know that a gun c... (read more)

AI ALIGNMENT FORUM
AF

All of Johannes C. Mayer's Comments + Replies