AI doing philosophy = AI generating hands?

Wei Dai

I've been playing around with Stable Diffusion recently, and an analogy occurred to me between today's AI's notoriously bad generation of hands and future AI's potentially bad reasoning about philosophy.

In case you aren't already familiar, currently available image generation AIs are very prone to outputting bad hands, e.g., ones with four or six fingers, or two thumbs, or unnatural poses, or interacting with other objects in very strange ways. Perhaps what's especially striking is how bad AIs are at hands relative to other image generation capabilities, thus serving as a cautionary tale about differentially decelerating philosophy relative to other forms of intellectual progress, e.g., scientific and technological progress.

AI generated landscapes and hands

Is anyone looking into differential artistic progress as a possible x-risk? /jk

Some explanations I've seen for why AI is bad at hands:

it's hard for AIs to learn hand generation because of how many poses a hand can make, how many different ways it can interact with other objects, and how many different viewing angles AIs need to learn to reproduce
each 2D image provides only partial information about a hand (much of it is often obscured behind other objects or parts of itself)
most hands in the training data are very low resolution (a tiny part of the overall image) and thus not helpful for training AI
the proportion of hands in the training set is too low for the AI to devote much model capacity to hand generation ("misalignment" between the loss function and what humans care about probably also contributes to this)
AI developers just haven't collected and trained AI on enough high quality hand images yet

There are news articles about this problem going back to at least 2022, and I can see a lot of people trying to solve it (on Reddit, GitHub, arXiv) but progress has been limited. Straightforward techniques like prompt engineering and finetuning do not seem to help much. Here are 2 SOTA techniques, to give you a glimpse of what the technological frontier currently looks like (at least in open source):

Post-process images with a separate ML-based pipeline to fix hands after initial generation. This creates well-formed hands but doesn't seem to take interactions with other objects into (sufficient or any) consideration.
If you're not trying to specifically generate hands, but just don't want to see incidentally bad hands in images with humans in them, get rid of all hand-related prompts, LoRAs, textual inversions, etc., and just putting "hands" in the negative prompt. This doesn't eliminate all hands but reduces the number/likelihood of hands in the picture and also makes the remaining ones look better. (The idea behind this is that it makes the AI "try less hard" to generate hands, and perhaps focus more on central examples that it has more training on. I was skeptical when first reading about this on Reddit, especially after trying many other similar tips that failed to accomplish anything, but this one actually does seem to work, at least much of the time.)

Of course generating hands is ultimately not a very hard problem. Hand anatomy and its interactions with other objects pose no fundamental mysteries. Bad hands are easy for humans to recognize and therefore we have quick and easy feedback for how well we're solving the problem. We can use our explicit understanding of hands to directly help solve the problem (solution 1 above used at least the fact that hands are compact 3D objects), or just provide the AI with more high quality training data (physically taking more photos of hands if needed) until it recognizably fixed itself.

What about philosophy? Well, scarcity of existing high quality training data, check. Lots of unhelpful data labeled "philosophy", check. Low proportion of philosophy in the training data, check. Quick and easy to generate more high quality data, no. Good explicit understanding of the principles involved, no. Easy to recognize how well the problem is being solved, no. It looks like with philosophy we've got many of the factors that make hand generation a hard problem for now, and none of the factors that make it probably not that hard in the longer run.

In a parallel universe with a saner civilization, there must be tons of philosophy professors workings with tons of AI researchers to try to improve AI's philosophical reasoning. They're probably going on TV and talking about 养兵千日，用兵一时 (feed an army for a thousand days, use it for an hour) or how proud they are to contribute to our civilization's existential safety at this critical time. There are probably massive prizes set up to encourage public contribution, just in case anyone had a promising out of the box idea (and of course with massive associated infrastructure to filter out the inevitable deluge of bad ideas). Maybe there are extensive debates and proposals about pausing or slowing down AI development until metaphilosophical research catches up.

In the meantime, back in our world, there's one person, self-taught in AI and philosophy, writing about a crude analogy between different AI capabilities. In the meantime, there are more people visibly working to improve AI's hand generation than AI's philosophical reasoning.

Some explanations I've seen for why AI is bad at hands:

My girlfriend practices drawing a lot, and has told me many times that hands (and faces) are hard not because they're unusual geometrically but because humans are particularly sensitive to "weirdness" in them. So an artist can fudge a lot with most parts of the image, but not with hands or faces.

My assumption for some time has been that e.g. those landscape images you show are just as bad as the hands, but humans aren't as tuned to notice their weirdness.

Even granting your girlfriend's point, it's still true that AIs' image generation capabilities are tilted more towards landscapes (and other types of images) and away from hands, compared with humans, right? I mean, by the time that any human artist can create landscape images that look anywhere nearly as good as the ones in my example, they would certainly not be drawing hands as bad as the ones in my example (i.e., completely deformed, with wrong number of fingers and so on).

By the time a human artist can create landscape images which look nearly as good as those examples to humans, yeah, I'd expect they at least get the number of fingers on a hand consistently right (which is also a "how good it looks to humans" thing). But that's still reifying "how good it looks to humans" as the metric.

There are news articles about this problem going back to at least 2022

But people have known it well before 2022. The observation that hands appear uniquely hard probably goes back to the BigGAN ImageNet Generator released in late 2018 and widely used on Ganbreeder; that was the first general-purpose image generator which was high-quality enough that people could begin to notice that the hands seemed a lot worse. (Before that, the models are all too domain-restricted or too low-quality to make that kind of observation.) If no one noticed it on Ganbreeder, we definitely had begun noticing it in Tensorfork's anime generator work during 2019-2020 (especially the TADNE preliminaries), and that's why we created hand-tailored hand datasets and released PALM in June 2020.

(And I have been telling people 'hands are hard' ever since, as they keep rediscovering this... I'm still a little surprised how unwilling generator creators seem to be to create hand-specific datasets or add in hand-focus losses like Make-a-Scene's focal losses, considering how once SD was released, complaints about hands exploded in frequency and became probably the single biggest reason that samples had to be rejected or edited.)

In a parallel universe with a saner civilization, there must be tons of philosophy professors workings with tons of AI researchers to try to improve AI's philosophical reasoning. They're probably going on TV and talking about 养兵千日，用兵一时 (feed an army for a thousand days, use it for an hour) or how proud they are to contribute to our civilization's existential safety at this critical time. There are probably massive prizes set up to encourage public contribution, just in case anyone had a promising out of the box idea (and of course with massive associated infrastructure to filter out the inevitable deluge of bad ideas). Maybe there are extensive debates and proposals about pausing or slowing down AI development until metaphilosophical research catches up.

This paragraph gives me the impression that you think we should be spending a lot more time, resources and money on advancing AI philosophical competence. I think I disagree, but I'm not exactly sure where my disagreement lies. So here are some of my questions:

How difficult do you expect philosophical competence to be relative to other tasks? For example:
- Do you think that Harvard philosophy-grad-student-level philosophical competence will be one of the "last" tasks to be automated before AIs are capable of taking over the world?
- Do you expect that we will have robots that are capable of reliably cleaning arbitrary rooms, doing laundry, and washing dishes, before the development of AI that's as good as the median Harvard philosophy graduate student? If so, why?
Is the "problem" more that we need a superhuman philosophical reasoning to avoid a catastrophe? Or is the problem that even top-human-level philosophers are hard to automate in some respect?
Why not expect philosophical competence to be solved "by default" more-or-less using transfer learning from existing philosophical literature, and human evaluation (e.g. RLHF, AI safety via debate, iterated amplification and distillation etc.)?
- Unlike AI deception generally, it seems we should be able to easily notice if our AIs are lacking in philosophical competence, making this problem much less pressing, since people won't be comfortable voluntarily handing off power to AIs that they know are incompetent in some respect.
- To the extent you disagree with the previous bullet point, I expect it's either because you think the problem is either (1) sociological (i.e. the problem is that people will actually make the mistake of voluntarily handing power to AIs they know are philosophically incompetent), or the problem is (2) hard because of the difficulty of evaluation (i.e. we don't know how to evaluate what good philosophy looks like).
  - In case (1), I think I'm probably just more optimistic than you about this exact issue, and I'd want to compare it to most other cases where AIs fall short of top-human level performance. For example, we likely would not employ AIs as mathematicians if people thought that AIs weren't actually good at math. This just seems obvious to me.
  - Case (2) seems more plausible to me, but I'm not sure why you'd find this problem particularly pressing compared to other problems of evaluation, e.g. generating economic policies that look good to us but are actually bad.
    - More generally, the problem of creating AIs that produce good philosophy, rather than philosophy that merely looks good, seems like a special case of the general "human simulator" argument, where RLHF is incentivized to find AIs that fool us by producing outputs that look good to us, but are actually bad. To me it just seems much more productive to focus on the general problem of how to do accurate reinforcement learning (i.e. RL that rewards honest, corrigible, and competent behavior), and I'm not sure why you'd want to focus much on the narrow problem of philosophical reasoning as a special case here. Perhaps you can clarify your focus here?
What specific problems do you expect will arise if we fail to solve philosophical competence "in time"?
- Are you imagining, for example, that at some point humanity will direct our AIs to "solve ethics" and then implement whatever solution the AIs come up with? (Personally I currently don't expect anything like this to happen in our future, at least in a broad sense.)

What specific problems do you expect will arise if we fail to solve philosophical competence “in time”?

The super-alignment effort will fail.
Technological progress will continue to advance faster than philosophical progress, making it hard or impossible for humans to have the wisdom to handle new technologies correctly. I see AI development itself as an instance of this, for example the e/acc crowd trying to advance AI without regard to safety because they think it will automatically align with their values (something about "free energy"). What if, e.g., value lock-in becomes possible in the future and many decide to lock in their current values (based on their religions and/or ideologies) to signal their faith/loyalty?
AIs will be optimized for persuasion and humans won't know how to defend against bad but persuasive philosophical arguments aimed to manipulate them.

but I’m not sure why you’d find this problem particularly pressing compared to other problems of evaluation, e.g. generating economic policies that look good to us but are actually bad

Bad economic policies can probably be recovered from and are therefore not (high) x-risks.

My answers to many of your other questions are "I'm pretty uncertain, and that uncertainty leaves a lot of room for risk." See also Some Thoughts on Metaphilosophy if you haven't already read that, as it may help you better understand my perspective. And, it's also possible that in the alternate sane universe, a lot of philosophy professors have worked with AI researchers on the questions you raised here, and adequately resolved the uncertainties in the direction of "no risk", and AI development has continued based on that understanding, but I'm not seeing that happening here either.

Let me know if you want me to go into more detail on any of the questions.

Since I wrote this post, AI generation of hands has gotten a lot better, but the top multimodal models still can't count fingers from an existing image. Gemini 2.5 Pro, Grok 3, and Claude 3.7 Sonnet all say this picture (which actually contains 8 fingers in total) contains 10 fingers, while ChatGPT 4o says it contains 12 fingers!

Philosophy and to some extent even decision theory are more like aspects of value content. AGIs and ASIs have the capability to explore them, if only they had the motive. Not taking away this option and not disempowering its influence doesn't seem very value-laden, so it's not pivotal to explore it in advance, even though it would help. Avoiding disempowerment is sufficient to eventually get around to industrial production of high quality philosophy. This is similar to how the first generations of powerful AIs shouldn't pursue CEV, and more to the point don't need to pursue CEV.

Some explanations I've seen for why AI is bad at hands:

My assumption for some time has been that e.g. those landscape images you show are just as bad as the hands, but humans aren't as tuned to notice their weirdness.

There are news articles about this problem going back to at least 2022

In a parallel universe with a saner civilization, there must be tons of philosophy professors workings with tons of AI researchers to try to improve AI's philosophical reasoning. They're probably going on TV and talking about 养兵千日，用兵一时 (feed an army for a thousand days, use it for an hour) or how proud they are to contribute to our civilization's existential safety at this critical time. There are probably massive prizes set up to encourage public contribution, just in case anyone had a promising out of the box idea (and of course with massive associated infrastructure to filter out the inevitable deluge of bad ideas). Maybe there are extensive debates and proposals about pausing or slowing down AI development until metaphilosophical research catches up.

How difficult do you expect philosophical competence to be relative to other tasks? For example:
- Do you think that Harvard philosophy-grad-student-level philosophical competence will be one of the "last" tasks to be automated before AIs are capable of taking over the world?
- Do you expect that we will have robots that are capable of reliably cleaning arbitrary rooms, doing laundry, and washing dishes, before the development of AI that's as good as the median Harvard philosophy graduate student? If so, why?
Is the "problem" more that we need a superhuman philosophical reasoning to avoid a catastrophe? Or is the problem that even top-human-level philosophers are hard to automate in some respect?
Why not expect philosophical competence to be solved "by default" more-or-less using transfer learning from existing philosophical literature, and human evaluation (e.g. RLHF, AI safety via debate, iterated amplification and distillation etc.)?
- Unlike AI deception generally, it seems we should be able to easily notice if our AIs are lacking in philosophical competence, making this problem much less pressing, since people won't be comfortable voluntarily handing off power to AIs that they know are incompetent in some respect.
- To the extent you disagree with the previous bullet point, I expect it's either because you think the problem is either (1) sociological (i.e. the problem is that people will actually make the mistake of voluntarily handing power to AIs they know are philosophically incompetent), or the problem is (2) hard because of the difficulty of evaluation (i.e. we don't know how to evaluate what good philosophy looks like).
  - In case (1), I think I'm probably just more optimistic than you about this exact issue, and I'd want to compare it to most other cases where AIs fall short of top-human level performance. For example, we likely would not employ AIs as mathematicians if people thought that AIs weren't actually good at math. This just seems obvious to me.
  - Case (2) seems more plausible to me, but I'm not sure why you'd find this problem particularly pressing compared to other problems of evaluation, e.g. generating economic policies that look good to us but are actually bad.
    - More generally, the problem of creating AIs that produce good philosophy, rather than philosophy that merely looks good, seems like a special case of the general "human simulator" argument, where RLHF is incentivized to find AIs that fool us by producing outputs that look good to us, but are actually bad. To me it just seems much more productive to focus on the general problem of how to do accurate reinforcement learning (i.e. RL that rewards honest, corrigible, and competent behavior), and I'm not sure why you'd want to focus much on the narrow problem of philosophical reasoning as a special case here. Perhaps you can clarify your focus here?
What specific problems do you expect will arise if we fail to solve philosophical competence "in time"?
- Are you imagining, for example, that at some point humanity will direct our AIs to "solve ethics" and then implement whatever solution the AIs come up with? (Personally I currently don't expect anything like this to happen in our future, at least in a broad sense.)

What specific problems do you expect will arise if we fail to solve philosophical competence “in time”?

The super-alignment effort will fail.
Technological progress will continue to advance faster than philosophical progress, making it hard or impossible for humans to have the wisdom to handle new technologies correctly. I see AI development itself as an instance of this, for example the e/acc crowd trying to advance AI without regard to safety because they think it will automatically align with their values (something about "free energy"). What if, e.g., value lock-in becomes possible in the future and many decide to lock in their current values (based on their religions and/or ideologies) to signal their faith/loyalty?
AIs will be optimized for persuasion and humans won't know how to defend against bad but persuasive philosophical arguments aimed to manipulate them.

but I’m not sure why you’d find this problem particularly pressing compared to other problems of evaluation, e.g. generating economic policies that look good to us but are actually bad

Bad economic policies can probably be recovered from and are therefore not (high) x-risks.

Let me know if you want me to go into more detail on any of the questions.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

22

AI doing philosophy = AI generating hands?

22