How could we know that an AGI system will have good consequences?

So8res

I wasn't saying you made all those assumption, I was trying to imagine an empirical scenario to get your assumptions, and the first thing to come to my mind produced even stricter ones.

I do realize now that I messed up my comment when I wrote

in practice reduces just to the part "have a story for why inductive bias and/or non-independence work in your favor", because I currently think Normality + additivity + independence are bad assumptions, and I see that as almost a null advice.

Here there should not be Normality, just additivity and independence, in the sense of . Sorry.

But I do expect that there are pretty similar-looking results where the independence assumption is substantially relaxed.

I do agree you could probably obtain similar-looking results with relaxed versions of the assumptions.

However, the same way $U - V ⊥ V$ seems quite specific to me, and you would need to make a convincing case that this is what you get in some realistic cases to make your theorem look useful, I expect this will continue to apply for whatever relaxed condition you can find that allows you to make a theorem.

Example: if you said "I made a version of the theorem assuming there exists $f$ such that $f (U, V) ⊥ V$ for $f$ in some class of functions", I'd still ask "and in what realistic situations does such a setup arise, and why?"

When is Goodhart catastrophic?

rotatingpaguro2y10

This model makes two really strong assumptions: that optimization is like conditioning, and that and $V$ are independent.
[...]
There's also a sort of implicit assumption in even using a framing that thinks about things as $X + V$ ; the world might be better thought of as naturally containing $(U, V)$ tuples (with $U$ our proxy measurement), and $X = U - V$ could be a sort of unnatural construction that doesn't make sense to single out in the real world. (We do think this framing is relatively natural, but won't get into justifications here.)

Will you get into justifications in the next post? Because otherwise the following advice, which I consider literally correct:

In an alignment plan involving generation and evaluation, you should either have reason to believe that your classifier's errors are light-tailed, or have a story for why inductive bias and/or non-independence work in your favor.

in practice reduces just to the part "have a story for why inductive bias and/or non-independence work in your favor", because I currently think Normality + additivity + independence are bad assumptions, and I see that as almost a null advice.

I think that Normality + additivity + independence come out together if you have a complex system subject to small perturbations, because you can write any dynamic as linear relationships over many variables. This gets you the three perks with:

Normality: complex system means many variables with nontrivial roles, and so the linearization tends to produce Normal distributions, it behaves like a sum with not too much concentrated weights.
Additivity: due to the small perturbations that allow you to linearize any relationship as approximation.
Independence: a linear system should be easy enough to analyze that you expect, if you spend effort, to get to a situation where the error is independent, and all the rest has been accounted for in some way.

Since we want to study the situation in which we apply a lot of optimization pressure, I think this scenario gets thrown out the window.

So:

Do you have a more general reason to expect these assumptions? Possibly each one or subsets separately? First raw ideas that come to my mind:
1. Normality because the number of variables involved grows in a balanced way with nonlinearity such that you get Normality
2. Additivity because scenario we can realistically study are limited enough that the kind of errors you can make stay the same, and we have to deliberately put ourselves in that situation
3. Independence because a human manages to get as much information as possible until some hard boundary of chaos
Do you have some clever trick such that it is always possible to always see the problem in this light? I expect not because utilities can only be affinely transformed.

Example: here

But now let's look at a case where $X$ and $V$ are heavier-tailed. Say that the probability density functions (PDFs) of $X$ and $V$ are proportional to $exp (- \sqrt{| x |})$ , instead of $exp (- c x^{2})$ like before.

my gut instinct tells me to look at elliptical distributions like $exp (- (a x^{2} + b y^{2})^{c})$ , which will not show this specific split-tail behavior. My gut instinct is not particularly justified, but seems to be making weaker assumptions.

Connectomics seems great from an AI x-risk perspective

rotatingpaguro2y41

I have a vague impression that I am not crazy to hope for whole primate-brain connectomes in the 2020s and whole human-brain connectomes in the 2030s, if all goes well.

After reading the post "Whole Brain Emulation: No Progress on C. elegans After 10 Years" I was left with the general impression that this stuff is very difficult; but I don't know the details, and that post talks about simulation given a connectome, not getting a connectome, which maybe then is easier even for a huge primate brain, I guess? And I don't know what probability you mean with "not crazy".

A market here is thus apposite:

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments