The Weighted Perplexity Benchmark: Tokenizer-Normalized Evaluation for Language Model Comparison — AI Alignment Forum