Peter Barnett

Researcher at MIRI

EA and AI safety

https://peterbarnett.org/

Wiki Contributions

Comments

Sorted by

I think this comment might be more productive if you described why you expect this approach to fail catastrophically when dealing with powerful systems (in a way that doesn't provide adequate warning). Linking to previous writing on this could be good (maybe this comment of yours on debate/scalable oversight).

I'm confused here. It seems to me that if your AI normally does evil things and then sometimes (in certain situations) does good things, I would not call it "aligned", and certainly the alignment is not stable (because it almost never takes "good" actions).  Although this thing is also not robustly "misaligned" either.

(I don't mean to dogpile)
I think that selection is the correct word, and that it doesn't really seem to be smuggling in incorrect connections to evolution. 

We could imagine finding a NN that does well according to a loss function by simply randomly initializing many many NNs, and then keeping the one that does best according to the loss function. I think this process would accurately be described as selection; we are literally selecting the model which does best. 

I'm not claiming that SGD does this[1], just giving an example of a method to find a low-loss parameter configuration which isn't related to evolution, and is (in my opinion) best described as "selection".

  1. ^

    Although "Is SGD a Bayesian sampler? Well, almost" does make a related claim.
     

So could an AI engineer create an AI blob of compute the same size as the brain, with its same structural parameters, feed it the same training data, and get the same result ("don't steal" rather than "don't get caught")?

There is a disconnect with this question. 

I think Scott is asking “Supposing an AI engineer could create something that was effectively a copy of a human brain and the same training data, then could this thing learn the “don’t steal” instinct over the “don’t get caught” instinct?” 
Eliezer is answering “Is an AI engineer able to create a copy of the human brain, provide it with the same training data a human got, and get the “don’t steal” instinct?”