<note> I work in Clarity at OpenAI. Chris and I have discussed this response (though I cannot claim to represent him).
Does "faithful" mean "100% identical in terms of I/O", or more like "captures all of the important elements of"?
I'd say faithfulness lies on a spectrum. Full IO determinism on a neural network is nearly impossible (given the vagaries of floating point arithmetic), but what is really of interest to us is “effectively identical IO”. A working definition of this could be - an interpretable... (read more)
<note> I work in Clarity at OpenAI. Chris and I have discussed this response (though I cannot claim to represent him).
I'd say faithfulness lies on a spectrum. Full IO determinism on a neural network is nearly impossible (given the vagaries of floating point arithmetic), but what is really of interest to us is “effectively identical IO”. A working definition of this could be - an interpretable... (read more)