User Comment Replies — AI Alignment Forum

Inner and outer alignment decompose one hard problem into two extremely hard problems

I wasn't intending for a metaphor of "biomimicry" vs "modernist".

(Claim 1) Wings can't work in space because there's no air. The lack of air is a fundamental reason for why no wing design, no matter how clever it is, will ever solve space travel.

If TurnTrout is right, then the equivalent statement is something like (Claim 2) "reward functions can't solve alignment because alignment isn't maximizing a mathematical function."

The difference between Claim 1 and Claim 2 is that we have a proof of Claim 1, and therefore don't bother debating it anymore, wh... (read more)

2Alex Turner2y

Have you read A shot at the diamond alignment problem? If so, what do you think of it?

Inner and outer alignment decompose one hard problem into two extremely hard problems

anonymousaisafety2y13

To some extent, I think it's easy to pooh-pooh finding a flapping wing design (not maximally flappy, merely way better than the best birds) when you're not proposing a specific design for building a flying machine that can go to space. Not in the tone of "how dare you not talk about specifics," but more like "I bet this chemical propulsion direction would have to look more like birds when you get down to brass tacks."

1Charlie Steiner2y

Wait, but surely RL-developed shards that work like human values are the biomimicry approach here, and designing a value learning scheme top-down is the modernist approach. I think this metaphor has its wires crossed.

Godzilla Strategies

anonymousaisafety3y1836

James Mickens is writing comedy. He worked in distributed systems. A "distributed system" is another way to say "a scenario in which you absolutely will have to use software to deal with your broken hardware". I can 100% guarantee that this was written with his tongue in his cheek.

The modern world is built on software that works around HW failures.

You likely have ECC ram in your computer.
There are checksums along every type of data transfer (Ethernet frame check sequences, IP header checksums, UDP datagram checksums, ICMP checksums, eMMC checksums, c

... (read more)

leogao3y86

I agree that the SW/HW analogy is not a good analogy for AGI safety (I think security is actually a better analogy), but I would like to present a defence of the idea that normal systems reliability engineering is not enough for alignment (this is not necessarily a defence of any of the analogies/claims in the OP).

Systems safety engineering leans heavily on the idea that failures happen randomly and (mostly) independently, so that enough failures happening together by coincidence to break the guarantees of the system is rare. That is:

RAID is based on the

... (read more)

Why Copilot Accelerates Timelines

anonymousaisafety3y00

Section 1, section 10, and section 11 cover the scenario of R&D automation via AI/ML systems that drive more productive R&D automation, resulting in a positive feedback loop, without requiring the typical "self-improving agent" -- it's the R&D system (people + AI/ML products) as a whole that is self-improving, not the individual AI/ML systems.

I highly recommend reading the entire report though. It was released in 2019 and I think it was brushed aside a little bit too easily. The past 3 years have (in my mind) provided sufficient evidence of thi... (read more)

Why Copilot Accelerates Timelines

anonymousaisafety3y10

This was one of the central points in the CAIS technical report.

0Michaël Trazzi3y

Thanks for the pointer. Any specific section / sub-section I should look into?

AI ALIGNMENT FORUM
AF

All of anonymousaisafety's Comments + Replies