All of Adam Kaufman 's Comments + Replies

Kudos for releasing a concept of a plan! Some thoughts:

Regarding the first safety case:

  • The amount of progress in mech interp required to make the first safety case suitable seems overly optimistic to me; I basically think that most of the "limitations" are in fact pretty serious. However, I appreciate that the attempt to include concrete requirements.
  • I believe that getting good results from the following experiments might be particularly unrealistic:
    • "In order to robustify our evals against sandbagging, we ran an experiment where we steered one or more trut
... (read more)