How can the mechanism by which the model outputs ‘true’ representations of its processing be verified?
Re ‘translation mechanism’: How could a model use language to describe its processing if it includes novel concepts, mechanisms, or objects for which there are no existing examples in human-written text? Can a model fully know what it is doing?
Supposing an AI was capable of at least explaining around or gesturing towards this processing in a meaningful way - would humans be able to interpret these explanations sufficiently such th... (read more)
Things I'm confused about:
How can the mechanism by which the model outputs ‘true’ representations of its processing be verified?
Re ‘translation mechanism’: How could a model use language to describe its processing if it includes novel concepts, mechanisms, or objects for which there are no existing examples in human-written text? Can a model fully know what it is doing?
Supposing an AI was capable of at least explaining around or gesturing towards this processing in a meaningful way - would humans be able to interpret these explanations sufficiently such th... (read more)