Would you say Yudkowsky's views are a mischaracterisation of neural network proponents, or that he's mistaken about the power of loose analogies?
So, if I'm understanding you correctly:
Is building an aligned sovereign to end the acute risk period different to a pivotal act in your view?
This is interesting work, but I’m skeptical of the interpretation. For example, I don’t think it is problematic from a safety point of view if a model ever behaves in a shutdown avoiding manner, but rather if it behaves in a shutdown avoiding manner against the interests of its operators.
I think your example shutdown request doesn’t reflect this situation well because it is unclear whether the downsides of shutdown (loss of capability) are outweighed by the upsides (speed) and because it asks for a feeling and not a judgement. If I reframe your request wit...
A few questions, if you have time:
Γ=Σ^R, it's a function from programs to what result they output. It can be thought of as a computational universe, for it specifies what all the functions do.
Should this say "elements are function... They can be thought of as...?"
Can you make a similar theory/special case with probability theory, or do you really need infra-bayesianism? If the second, is there a simple explanation of where probability theory fails?
Do you run into a distinction between benign and malign tampering at any point? For example, if humans can never tell the difference between the tampered and non-tampered result, and their own sanity has not been compromised, it is not obvious to me that the tampered result is worse than the non-tampered result.
It might be easier to avoid compromising human sanity + use hold-out sensors than to solve ELK in general (though maybe not? I haven't thought about it much).
I'm a bit curious about what job "dimension" is doing here. Given that I can map an arbitrary vector in to some point in via a bijective measurable map (https://en.wikipedia.org/wiki/Standard_Borel_space#Kuratowski's_theorem), it would seem that the KPD theorem is false. Is there some other notion of "sufficient statistic complexity" hiding behind the idea of dimensionality, or am I missing something?
Given that outside C approaches to AGI are likely to be substantially unlike anything we’re familiar with, and that controllable AGI is desirable, don’t you think that there’s a good chance these unknown algorithms have favourable control properties?
I think LLMs have some nice control properties too, not so much arguing against LLMs being better than unknown, just the idea that we should confidently expect control to be hard for unknown algorithms.