Torture and Dust Specks and Joy--Oh my! or: Non-Archimedean Utility Functions as Pseudograded Vector Spaces

[-]Hoagy6y70

Apologies if this is not the discussion you wanted, but it's hard to engage with comparability classes without a framework for how their boundaries are even minimally plausible.

Would you say that all types of discomfort are comparable with higher quantities of themselves? Is there always a marginally worse type of discomfort for any given negative experience? So long as both of these are true (and I struggle to deny them) then transitivity seems to connect the entire spectrum of negative experience. Do you think there is a way to remove the transitivity of comparability and still have a coherent system? This, to me, would be the core requirement for making dust specks and torture incomparable.

[-]Louis_Brown6y30

I agree that delineating the precise boundaries of comparability classes is a uniquely challenging task. Nonetheless, it does not mean they don't exist--to me your claim feels along the same lines as classical induction "paradoxes" involving classifying sand heaps. While it's difficult to define exactly what a sand heap is, we can look at many objects and say with certainty whether or not they are sand heaps, and that's what matters for living in the world and making empirical claims (or building sandcastles anyway).

I suspect it's quite likely that experiences you may be referring to as "higher quantities of themselves" within a single person are in fact qualitatively different and no longer comparable utilities in many cases. Consider the dust specks: they are assumed to be minimally annoying and almost indetectable to the bespeckèd. However, if we even slightly upgrade them so as to cause a noticeable sting in their targeted eye, they appear to reach a whole different level. I'd rather spend my life plagued by barely noticeable specks (assuming they have no interactions) than have one slightly burn my eyeball.

[-]Hjalmar_Wijk6y20

Theron Pummer has written about this precise thing in his paper on Spectrum Arguments, where he touches on this argument for "transitivity=>comparability" (here notably used as an argument against transitivity rather than an argument for comparability) and its relation to 'Sorites arguments' such as the one about sand heaps.

Personally I think the spectrum arguments are fairly convincing for making me believe in comparability, but I think there's a wide range of possible positions here and it's not entirely obvious which are actually inconsistent. Pummer even seemed to think rejecting transitivity and comparability could be a plausible position and that the math could work out in nice ways still.

[-]Vanessa Kosoy6y50

Two comments:

The thing you called "pseudograding" is normally called "filtration".
In practice, because of the complexity of the world, and especially because of the presence of probabilistic uncertainty, an agent following a non-Archimedean utility function will always consider only the component corresponding to the absolute maximum of $I$ , since there will never be a choice between A and B such that these components just happen to be exactly equal. So it will be equivalent to an Archimedean agent whose utility is this worst component. (You can have an $I$ without an absolute maximum but I don't think it's possible to define any reasonable utility function like that, where by "reasonable" I roughly mean that, it's possible to build some good theory of reinforcement learning out of it.)

[-]Louis_Brown6y10

The thing you called "pseudograding" is normally called "filtration".

Ah, thanks! I knew there had to be something for that, just couldn't remember what it was. I was embarrassed posting with a made-up word, but I really did look (and ask around) and couldn't find what I needed.

...Although, reading the definition, I'm not sure it's exactly the same...the severity classes aren't nested, and I think this is probably an important distinction to the conceptual framing, even if the math is equivalent. If I start with a filtration proper, I need to extract the severity classes in a way that seems slightly more convoluted than what I did.

In practice, because of the complexity of the world, and especially because of the presence of probabilistic uncertainty, an agent following a non-Archimedean utility function will always consider only the component corresponding to the absolute maximum of I, since there will never be a choice between A and B such that these components just happen to be exactly equal. So it will be equivalent to an Archimedean agent whose utility is this worst component.

See my response to Dacyn.

[-]Vanessa Kosoy6y20

If I understand what you do correctly, the severity classes are just the set differences $V_{α} ∖ ⋃_{β < α} V_{β}$ , where ${V_{α}}_{α \in I}$ is the filtration. I think that you also assume that the quotient $V_{α} / ⋃_{β < α} V_{β}$ is one-dimensional and equipped with a choice of "positive" direction.

[-]Louis_Brown6y10

Yes! This is all true. I thought set differences of infinite unions and quotients would only make the post less accessible for non-mathematicians though. I also don't see a natural way to define the filtration without already having defined the severity classes.

[-]romeostevensit6y30

I've tried intuitive approaches to thinking along these lines which failed so it's really nice to see a serious approach. I see this as key anti-moloch tech and want to use it to think about rivalrous and non-rivalrous goods.

[-]Donald Hobson6y20

Firstly I will focus on the most wrong part. The claim that non archimedian utilities are more efficient. In the real world there aren't 3^^^3 little impacts to add up. If the number of little impacts is a few hundred, and they are a trillion times smaller, then the little impacts make up less than a billionth of your utility. Usually you should be using less than a billionth of your compute to deal with them. For agents without vast amounts of compute, this means forgetting them altogether. This can be understood as an approximation strategy to maximize a normal archimedian utility.

There is also the question of different severity classes. If we can construct a sliding scale between specks and torture then we find the need for a weird cut off point, like a broken arm being in a different severity class than a broken toe.

[-]JakeArgent6y10

Intuitively speaking broken arm and broken toe are comparable. Broken arm is worse, broken toe is still bad. I'd rather get a broken arm than torture for 50 years, or even torture for 1 day.

For sliding scale of severities: there's a very difficult to compute but intuitively satisfying emphasis that can be imposed so the scale can't slide. It's the idea of "bouncing back". If you can't bounce back from an action that imparts negative utility, it forms a distinct class of utilities. Compare broken arm with torn-off toe. Compare both of those to 50 years of torture.

P.S: If you're familiar with Taleb's idea of "antifragility", that's the notion I'm basing these on.

[-]Donald Hobson6y*10

The idea is that we can take a finite list of items like this

Torture for 50 years

Torture for 40 years

...

Torture for 1 day

...

Broken arm

Broken toe

...

Papercut

Sneeze

Dust Speck

Presented with such a list you must insist that two items on this list are incomparable. In fact you must claim that some item is incomparably worse than the next item. I don't think that any number of broken toes is better than a broken arm. A million broken toes is clearly worse. Follow this chain of reasoning for each pair of items on the list. Claiming incomparably is a claim that no matter how much I try to subdivide my list, one item will still be infinitely worse than the next.

The idea of bouncing back is also not useful. Firstly it isn't a sharp boundary, you can mostly recover but still be somewhat scarred. Secondly you can replace an injury with something that takes twice as long to bounce back from, and they still seem comparable. Something that takes most of a lifetime to bounce back from is comparable to something that you don't bounce back from. This breaks if you assume immortality, or that bouncing back 5 seconds before you drop dead is of morally overwhelming significance, such that doing so is incomparable to not doing so.

[-]JakeArgent6y20

Broken arms vs toes: I agree that any number of broken toes wouldn't be better than a broken arm. But that's the point, these are _comparable_.

Incomparable breaks occur where you put the ellipses in your list. Torture for 40-50 years vs torture for 1 day is qualitatively distinct. I imagine a human being can bounce back from torture for 1 day, have scars but manage to prosper. That would be hellishly more difficult with torture for 40 years. We could count torture by day, 1-(365*40) and there would be a point of no return there. A duration of torture a person can't bounce back. It would depend on the person, what happens during and after etc, which is why it's not possible to compute that day. That doesn't mean we should ignore how humans work.

Here's the main beef I have with Dust Specks vs Torture: Statements like "1 million broken toes" or "3^^^3 dust specks" disregard human experience. That many dust specks on one person is torture. One on each is _practically nothing_. I'm simulating people experiencing these, and the result I arrive at is this; choose best outcome from (0 utils * 3^^^3) vs (-3^^^3 utils). This is easy to answer.

You may say "but 1 dust speck on a person isn't 0 utils, it's a very small negative utility" and yes, technically you're correct. But before doing the sum over people, take a look at the people. *Distribution matters.*

Humans don't work like linear sensory devices. Utility can't work linearly as well.

[-]Donald Hobson6y10

What if I make each time period in the "..." one nanosecond shorter than the previous.

You must believe that there is some length of time, t>most of a day, such that everyone in the world being tortured for t-1 nanosecond is better than one person being tortured for t.

Suppose there was a strong clustering effect in human psychology, such that less than a week of torture left peoples minds in one state, and more than a week left them broken. I would still expect the possibility of some intermediate cases on the borderlines. Things as messy as human psychology, I would expect there to not be a perfectly sharp black and white cutoff. If we zoom in enough, we find that the space of possible quantum wavefunctions is continuous.

There is a sense in which specs and torture feel incomparable, but I don't think this is your sense of incomparability, to me it feels like moral uncertainty about which huge number of specs to pick. I would also say that "Don't torture anyone" and "don't commit attrocities based on convoluted arguments" a good ethical injunction. If you think that your own reasoning processes are not very reliable, and you think philosophical thought experiments rarely happen in real life, then implementing the general rule "If I think I should torture someone, go to nearest psych ward" is a good idea. However I would want a perfectly rational AI which never made mistakes to choose torture.

[-]Louis_Brown6y10

we find the need for a weird cut off point, like a broken arm

For the cut-off point on a broken arm, I recommend the elbow [not a doctor].

Suppose there was a strong clustering effect in human psychology, such that less than a week of torture left peoples minds in one state, and more than a week left them broken. I would still expect the possibility of some intermediate cases on the borderlines. Things as messy as human psychology, I would expect there to not be a perfectly sharp black and white cutoff. If we zoom in enough, we find that the space of possible quantum wavefunctions is continuous.

I agree! You've made my point for me: it is precisely this messiness which grants us continuity on average. Some people will take longer than others to have qualitatively incomparably damaging effects from torture, and as such the expected impact of any significant torture will have a component on the severity level of 50 years torture. Hence, comparable (on expectation).

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

12

Torture and Dust Specks and Joy--Oh my! or: Non-Archimedean Utility Functions as Pseudograded Vector Spaces

12

Linearity and Archimedes

Comparability

Pseudograded Vector Spaces

Basis Independence

What's the point?