User Comment Replies — AI Alignment Forum

I'm surprised by the list of forms of power by what it leaves out.

A stereotypical example of power differences is bosses having relationships with their employees.

The boss has power over a different domain of the life of the employee than the domain of the relationship.

It's the problem of corruption where power from one domain leaks into a different domain where it doesn't belong.

If there's an option to advance one's career by sleeping with one's boss, that makes it issues of consent more tricky. Career incentives might pressure a person in the relationship even if they wouldn't want to be in it otherwise.

2Ramana Kumar9mo

Just to confirm that this is a great example and wasn't deliberately left out.

Examples of practical implications of Judea Pearl's Causality work

Christian Kleineidam3y00

What is selection for modularity and why is it important?

0CallumMcDougall3y

Probably the best explanation of this comes from John Wentworth's recent AXRP podcast, and a few of his LW posts. To put it simply, modularity is important because modular systems are usually much more interpretable (case in point: evolution has produced highly modular designs, e.g. organs and organ systems, whereas genetic algorithms for electronic circuit design frequently fail to find designs that are modular, and so they're really hard for humans to interpret, and verify that they'll work as expected). If we understood a bit more about the factors that select for modularity under a wide range of situations (e.g. evolutionary selection, or standard ML selection), then we might be able to use these factors to encourage more modular designs. On the more abstract level, it might help us break down fuzzy statements like "certain types of inner optimisers have separate world models and models of the objective", which are really statements about modules within a system. But in order to do any of this, we need to come up with a robust measure for modularity, and basically there isn't one at present.

Is there a convenient way to make "sealed" predictions?

Christian Kleineidam3y00

Mailing an envelope to your self does not allow other people to verify whether the envelope wasn't opened in between.

Maybe a quality forensic lab has the ability to tell whether the envelope was opened in between but most people you might show the letter don't.

2Daniel Kokotajlo3y

Also, it's a lot easier to fake by writing 10 letters with 10 different predictions and then burning the ones that don't come true.

Intelligence or Evolution?

Christian Kleineidam3y00

Error correction codes help a superintelligence to avoid self-modifying but they don't allow goals necessarily to be stable with changing reasoning abilities.

1Donald Hobson3y

Firstly this would be AI's looking at their own version of the AI alignment problem. This is not random mutation or anything like it. Secondly I would expect there to only be a few rounds maximum of self modification that runs risk to goals. (Likely 0 rounds) Firstly damaging goals looses a lot of utility. You would only do it if its a small change in goals for a big increase in intelligence. And if you really need to be smarter and you can't make yourself smarter while preserving your goals. You don't have millions of AI all with goals different from each other. The self upgrading step happens once before the AI starts to spread across star systems.

BASALT: A Benchmark for Learning from Human Feedback

Christian Kleineidam4y00

The AI safety community claims it is hard to specify reward functions. If we actually believe this claim, we should be able to create tasks where even if we allow people to specify reward functions, they won't be able to do so. That's what we've tried to do here.

It being hard to specify reward functions for a specific task and it being hard to specify reward functions for a more general AGI seem to me like two very different problems.

Additionally, developing a safe system and developing a nonsafe system are very different. Even if your reward function works 99,9% of the time it can be exploited in those cases where it fails.

5Rohin Shah4y

Okay, regardless of what the AI safety community claims, I want to make that claim. (I think a substantial chunk of the AI safety community also makes that claim but I'm not interested in defending that here.) As an aside, if I thought we could build task-specific AI systems for arbitrary tasks, and only super general AI systems were dangerous, I'd be advocating really hard for sticking with task-specific AI systems and never building super general AI systems (or only building them after some really high threshold of safety was met).

Developmental Stages of GPTs

Christian Kleineidam5y*40

Maybe put out some sort of prize for the best ideas for plans?

Ben Pace5y*150

Pretty sure OpenPhil and OpenAI currently try to fund plans that claim to look like this (e.g. all the ones orthonormal linked in the OP), though I agree that they could try increasing the financial reward by 100x (e.g. a prize) and see what that inspires.

If you want to understand why Eliezer doesn't find the current proposals feasible, his best writeups critiquing them specifically are this long comment containing high level disagreements with Alex Zhu's FAQ on iterated amplification and this response post to the details of Iterated Amplification.

As I und... (read more)

Are we in an AI overhang?

Christian Kleineidam5y30

That's the terrifying thing about NNs and what I dub the "neural net overhang": the cost to create a powerful NN is millions of times greater than the cost to run that NN.

I'm not sure why that's terrifying. It seems reassuring to me because it means that there's no way for the NN to suddenly go FOOM because it can't just quickly retrain.

gwern5y130

But it can. That's the whole point of GPT-3! Transfer learning and meta-learning are so much faster than the baseline model training. You can 'train' GPT-3 without even any gradient steps - just examples. You pay the extremely steep upfront cost of One Big Model to Rule Them All, and then reuse it everywhere at tiny marginal cost.

With NNs, 'foom' is not merely possible, it's the default. If you train a model, then as soon as it's done you get, among other things:

the ability to run thousands of copies in parallel on the same hardware
- in a context like A

... (read more)

1Donald Hobson5y

It means that if there are approaches that don't need as much compute, the AI can invent them fast.

Goodhart Taxonomy

Christian Kleineidam7y01

I upvoted the post for the general theory. On the other hand, I think the examples could be more clear and it would be good to find examples that are more commonly faced.

1Scott Garrabrant7y

I agree! The main product is the theory. I used examples to try to help communicate the theory, but they are not commonly faced at all.

AI ALIGNMENT FORUM
AF

All of ChristianKl's Comments + Replies