User Comment Replies — AI Alignment Forum

Can you explain where there is an error term in AlphaGo or where an error term might appear in hypothetical model similar to AlphaGo trained much longer with much more numerous parameters and computational resources?

0Gordon Seidoh Worley3y

AlphaGo is fairly constrained in what it's designed to optimize for, but it still has the standard failure mode of "things we forgot to encode". So for example AlphaGo could suffer the error of instrumental power grabbing in order to be able to get better at winning Go because we misspecified what we asked it to measure. This is a kind of failure introduced into the systems by humans failing to make m(X) adequately evaluate X as we intended, since we cared about winning Go games while also minimizing side effects, but maybe when we constructed m(X) we forgot about minimizing side effects.

G Gordon Worley III's Shortform

Richard Hollerith3y*00

At least one person here disagrees with you on Goodharting. (I do.)

You've written before on this site if I recall correctly that Eliezer's 2004 CEV proposal is unworkable because of Goodharting. I am granting myself the luxury of not bothering to look up your previous statement because you can contradict me if my recollection is incorrect.

I believe that the CEV proposal is probably achievable by humans if those humans had enough time and enough resources (money, talent, protection from meddling) and that if it is not achievable, it is because of reasons ot... (read more)

AI ALIGNMENT FORUM
AF

All of RHollerith's Comments + Replies