As an appreciable fraction of a neural network is composed of shards, large neural nets can possess quite intelligent constituent shards. These shards can be sophisticated enough to be well-modeled as playing negotiation games with each other, (potentially) explaining human psychological phenomena like akrasia and value changes from moral reflection. Shard theory also suggests an approach to explaining the shape of human values, and a scheme for RL alignment.
As an appreciable fraction of a neural network is composed of shards, large neural nets can possess quite intelligent constituent shards. These shards can be sophisticated enough to be well-modelledmodeled as playing negotiation games with each other, (potentially) explaining human psychological phenomena like akrasia and value changes from moral reflection. Shard theory also suggests an approach to explaining the shape of human values, and scheme for RL alignment.
As an appreciable fraction of a neural network is composed of shards, large neural nets can possess quite intelligent constituent shards. These shards can be sophisticated enough to be well-modeledmodelled as playing negotiation games with each other, (potentially) explaining human psychological phenomena like akrasia and value changes from moral reflection. Shard theory also suggests an approach to explaining the shape of human values, and scheme for RL alignment.
Shard
theoryTheory is an alignment research program, about the relationship between training variables and learned values in trained Reinforcement Learning (RL) agents. It is thus an approach to progressively fleshing out a mechanistic account of human values, learned values in RL agents, and (to a lesser extent) the learned algorithms in ML generally.