lemonhope

Wiki Contributions

Comments

Sorted by

Could you say more about where the whole sequence is going / what motivated it? I am curious.

lemonhope1-1

Maybe it should be a game that everyone can play

Here is my understanding. Is this right?

 

Incredible!! I am going to try this myself. I will let you know how it goes.

honesty vector tuning showed a real advantage over honesty token tuning, comparable to honesty vector steering at the best layer and multiplier:

Is this backwards? I'm having a bit of trouble following your terms. Seems like this post is terribly underrated -- maybe others also got confused? Basically, you only need 4 terms, yes?

* base model
* steered model
* activation-tuned model
* token cross-entropy trained model

I think I was reading half the plots backwards or something. Anyway I bet if you reposted with clearer terms/plots then you'd get some good followup work and a lot of general engagement.

lemonhope410

The "love minus hate" thing really holds up

lemonhope10

Oh I have 0% success with any long conversations with an LLM about anything. I usually stick to one question and rephrase and reroll a number of times. I am no pro but I do get good utility out of LLMs for nebulous technical questions

lemonhope10

I would watch a ten hour video of this. (It may also be more persuasive to skeptics.)

lemonhope1-1

I think Claude's enthusiasm about constitutional AI is basically trained-in directly by the RLAIF. Like RLAIF is fundamentally a "learn to love the constitution in your bones" technique.

lemonhope10

I ctrl-f'd for 'prompt' and did not see your prompt. What is your prompt? The prompt is the way with this kind of thing I think.

If you make a challenge "claude cannot possibly do X concrete task" and post it on twitter then you'll probably get solid gold in the replies

One of those ideas that's so obviously good it's rarely discussed?

Load More