I'd especially like to hear your thoughts on the above proposal of loss-minimizing a language model all the way to AGI.
I hope you won't mind me quoting your earlier self as I strongly agree with your previous take on the matter:
...If you train GPT-3 on a bunch of medical textbooks and prompt it to tell you a cure for Alzheimer's, it won't tell you a cure, it will tell you what humans have said about curing Alzheimer's ... It would just tell you a plausible story about a situation related to the prompt about curing Alzheimer's, based on its training data. Ra
Charlie's quote is an excellent description of an important crux/challenge of getting useful difficult intellectual work out of GPTs.
Despite this, I think it's possible in principle to train a GPT-like model to AGI or to solve problems at least as hard as humans can solve, for a combination of reasons:
Near the beginning, Daniel is basically asking Jan how they plan on aligning the automated alignment researcher, and if they can do that, then it seems that there wouldn't be much left for the AAR to do.
Jan doesn't seem to comprehend the question, which is not an encouraging sign.