Wait, where? I think the objection to "Doing that is quite hard" is not an objection to "it’s not obviously true that such algorithms are actually “achievable” for SGD" - it's an objection to the conclusion that model would try hard enough to justify arguments about deception from weak statement about loss decreasing during training.
"Safe" as in "safe enough for it to be on net better to run it" or "safe enough it wouldn't definitely kill everyone". It's not that I don't have popular intuition that GPT wouldn't kill anyone. It's just that I don't think it's a good habit to run progressively more capable systems while relying on informal intuitions about their safety. And then maybe I will see an explanation for why future safety tools would outpace capability progress, when now we are already at the point where current safety tools are not applicable to current AI systems.
I think you might be conflating two different scenarios?
I present alternative strategy for a mesaoptimizer that, yes, wasn't in the post, by I don't see why?
Is your claim that that’s basically nothing?
Yes, or at least it approaches relatively nothing as we get more competent optimizer.
If it’s always super easy to identify the base objective and then optimize it, then shouldn’t need to pay the penalty of storing c(get_base_obj), since it could also near-instantly derive the base objective.
If it doesn't store it, it isn't - it would be abl...
I think speed and complexity trade off against model's precision? Like, for simple objectives you can hardcode it, but for complex one you wouldn't have space for base objective that you couldn't spend on table of chemical elements or something. So I would expect t(mesaoptimize_env) for the base objective to be only slighter greater than t(is_deployed_in_world) + t(find(get_base_obj(env))) + t(mesaoptimize_env) for a mesaoptimizer that hardcodes a check, but computes base objective, because I expect t(find(get_base_obj(env))) to be mostly contained in t(me...
It was all very interesting, but what was the goal of these discussions? I mean I had an impression that pretty much everyone assigned >5% probability to "if we scale we all die" so it's already enough reason to work on global coordination on safety. Is the reasoning that the same mental process that assigned too low probability would not be able to come up with actual solution? Or something like "at the time they think their solution reduced probability of failure from 5% to 0.1% it would still be much higher"? Seems to be only possible if people don't understand arguments about inner optimisators or what not, as opposed to disagreeing with them.
I mean I had an impression that pretty much everyone assigned >5% probability to "if we scale we all die" so it's already enough reason to work on global coordination on safety.
What specific actions do you have in mind when you say "global coordination on safety", and how much of the problem do you think these actions solve?
My own view is that 'caring about AI x-risk at all' is a pretty small (albeit indispensable) step. There are lots of decisions that hinge on things other than 'is AGI risky at all'.
I agree with Rohin that the useful thing is trying t...
Under this picture, or any other simplicity bias, why NNs with more parameters generalize better?
Paradoxically, I think larger neural networks are more simplicity-biased.
The idea is that when you make your network larger, you increase the size of the search space and thus the number of algorithms that you're considering to include algorithms which take more computation. That reduces the relative importance of the speed prior, but increases the relative importance of the simplicity prior, because your inductive biases are still selecting from among those algorithms according to the simplest pattern that fits the data, such that you get good generalizat... (read more)