AI ALIGNMENT FORUM
AF

Jack

Posts

Sorted by New

0Jack R's Shortform

Wikitag Contributions

Comments

Sorted by

Newest

Human values & biases are inaccessible to the genome

Jack

3y40

Could you clarify a bit more what you mean when you say "X is inaccessible to the human genome?"

Clarifying inner alignment terminology

Jack

3y10

I'm not sure you have addressed Richard's point -- if you keep your current definition of outer alignment, then memorizing the answers to the finite set of data is always a way to score perfect loss, but intuitively doesn't seem like it would be intent aligned. And if memorization were never intent aligned, then your definition of outer alignment would be impossible.

Does the lottery ticket hypothesis suggest the scaling hypothesis?

Jack

4y20

When the network is randomly initialized, there is a sub-network that is already decent at the task.

From what I can tell, the paper doesn't demonstrate this--i.e. I don't think they ever test the performance of a sub-network with random weights (rather they test the performance of a subnetwork after training only the subnetwork). Though maybe this isn't what you meant, in which case you can ignore me :)