Probability that other architectures will scale as well as Transformers? — AI Alignment Forum