AI ALIGNMENT FORUMTags
AF

AI Benchmarking

•

Applied to Improving Model-Written Evals for AI Safety Benchmarking by Sunishchal Dev 1mo ago

•

Applied to Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents by Sam Brown 4mo ago

•

Applied to LLM Psychometrics: A Speculative Approach to AI Safety by pskl 10mo ago

•

Applied to Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols by Arjun Panickssery 10mo ago

•

Applied to MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures by corey morris 1y ago

•

Applied to Broken Benchmark: MMLU by awg 1y ago

•

Created by Tatiana Shavrina at 1y