AI ALIGNMENT FORUM
AF

Teun van der Weij

Posts

Sorted by New

2Teun van der Weij's Shortform

22d

0

4The Elicitation Game: Evaluating capability elicitation techniques

1mo

0

35[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

10mo

0

18An Introduction to AI Sandbagging

1y

0

21Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?

1y

0

Wikitag Contributions

Comments

Sorted by