AI ALIGNMENT FORUM
AF

Miguel de Guzman

https://www.whitehatstoic.com/

Posts

Sorted by New

0Migueldev's shortform

1y

0

Wikitag Contributions

Archetypal Transfer Learning

2y

(+4/-110)

Archetypal Transfer Learning

2y

(+6/-6)

Archetypal Transfer Learning

2y

(+8/-25)

Archetypal Transfer Learning

2y

(+46/-23)

Archetypal Transfer Learning

2y

(+26/-147)

Archetypal Transfer Learning

2y

(+39/-48)

Archetypal Transfer Learning

2y

(+121/-3)

Archetypal Transfer Learning

2y

(+324/-13)

Archetypal Transfer Learning

2y

(+10/-14)

Archetypal Transfer Learning

2y

(+220)

Comments

Sorted by

How to train your own "Sleeper Agents"

Miguel de Guzman

11mo20

Obtain a helpful-only model

Hello! Just wondering if this step is necessary? Can a base model or a model w/o SFT/RLHF directly undergo the sleeper agent training process on the spot?

(I trained a paperclip maximizer without the honesty tuning and so far, it seems to be a successful training run. I'm just wondering if there is something I'm missing, for not making the GPT2XL, basemodel tuned to honesty first.)