Hey, wanted to chip into the comments here because they are disappointingly negative.
I think your paper and this post are extremely good work. They won't push forward the all-things-considered viewpoint, but they surely push forward the lower bound (or adversarial) viewpoint. Also because Open Phil and Future Fund use some fraction of lower-end risk in their estimate, this should hopefully wipe that put. Together they much more rigorously lay out classic x-risk arguments.
I think that getting the prior work peer reviewed is also a massive win at least in a ...
Hey - reccommend looking at this paper: https://arxiv.org/abs/1807.07306
It shows a more elegant way than KL regularization for bounding the bit-rate of an auto-encoder bottleneck. This can be used to find the representations which are most important at a given level of information.
I think we can get additional information from the topological representation. We can look at the relationship between the different level sets under different cumulative probabilities. Although this requires evaluating the model over the whole dataset.
Let's say we've trained a continuous normalizing flow model (which are equivalent to ordinary differential equations). These kinds of model require that the input and output dimensionality are the same, but we can narrow the model as the depth increases by directing many of those dimensions to isotropic gaus...
(Edited a lot from when originally posted)
(For more info on consistency see the diagram here: https://jepsen.io/consistency )
I think that the prompt to think about partially ordered time naturally leads one to think about consistency levels - but when thinking about agency, I think it makes more sense to just think about DAGs of events, not reads and writes. Low-level reality doesn't really have anything that looks like key-value memory. (Although maybe brains do?) And I think there's no maintaining of invariants in low-level reality, just cause and effect...
This is a good post, definitely shows that these concepts are confused. In a sense both examples are failures of both inner and outer alignment -
Also, the choice to train the AI on pull requests at all is in a sense an outer alignment failure.