Ben Pace

I'm an admin of LessWrong. Here are a few things about me.

I generally feel more hopeful about a situation when I understand it better.
I have signed no contracts nor made any agreements whose existence I cannot mention.
I believe it is good take responsibility for accurately and honestly informing people of what you believe in all conversations; and also good to cultivate an active recklessness for the social consequences of doing so.
It is wrong to directly cause the end of the world. Even if you are fatalistic about what is going to happen.

Sequences

AI Alignment Writing Day 2019

AI Alignment Writing Day 2018

Posts

Sorted by New

5Benito's Shortform Feed

7y

15

41Forecasting Thread: AI Timelines

Q

5y

Q

33

30What Failure Looks Like: Distilling the Discussion

5y

3

18Radical Probabilism [Transcript]

5y

5

21Useful Does Not Mean Secure

5y

2

13AI Alignment Research Overview (by Jacob Steinhardt)

5y

0

11AI Alignment Writing Day Roundup #2

6y

2

63Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

5y

17

11AI Alignment Writing Day Roundup #1

6y

12

9Announcement: Writing Day Today (Thursday)

6y

0

28Introducing the AI Alignment Forum (FAQ)

6y

0

Wikitag Contributions

Adversarial Collaboration (Dispute Protocol)

3mo

Epistemology

5mo

(-454)

Epistemology

5mo

(+56/-56)

Epistemology

5mo

(+9/-4)

Epistemology

5mo

(+66/-553)

Petrov Day

7mo

(+714)

Comments

Sorted by

Newest

METR: Measuring AI Ability to Complete Long Tasks

Ben Pace1mo41

I think my front-end productivity might be up 3x? A shoggoth helped me building a stripe shop and do a ton of UI design that I would’ve been hesitant to take on myself (without hiring someone else to work with), as well as quality increase in speed of churning through front-end designs.

(This is going from “wouldn’t take on the project due to low skill” to “can take it on and deliver it in a reasonable amount of time”, which is different from “takes top programmer and speeds them up 3x”.)

Reply

Benito's Shortform Feed

Ben Pace1mo*30

Something a little different: Today I turn 28. If you might be open to do something nice for me for my birthday, I would like to request the gift of data. I have made a 2-4 min anonymous survey about me as a person, and if you have a distinct sense of me as a person (even just from reading my LW posts/comments) I would greatly appreciate you filling it out and letting me know how you see me!

Here's the survey.

It's an anonymous survey where you rate me on lots of attributes like "anxious", "honorable", "wise" and more. All multiple-choice. Two years ago I also shared a birthday survey amongst people who know me and ~70 people filled it out, and I learned a lot from it. I am very excited to see how the perception of me amongst the people I know has *changed*, and also to find out how people on LessWrong see me, so the core of this survey is ~20 of the same attributes.

In return for your kind gift, if you complete it, you get to see the aggregate ratings of me from last time!

This survey helps me understand how people see me, and recognize my blindspots, and I'm very grateful to anyone who takes a few mins to complete it. Two people completed it already and it took them 2 mins and 4 mins to complete it. (There are many further optional questions but it says clearly when the main bit is complete.)

I of course intend to publish the (aggregate) data in a LW post and talk about what I've learned from it :-)

Reply

How might we safely pass the buck to AI?

Ben Pace2mo65

Further detail on this: Cotra has more recently updated at least 5x against her original 2020 model in the direction of faster timelines.

Greenblatt writes:

Here are my predictions for this outcome:
25th percentile: 2 year (Jan 2027)
50th percentile: 5 year (Jan 2030)

Cotra replies:

My timelines are now roughly similar on the object level (maybe a year slower for 25th and 1-2 years slower for 50th)

This means 25th percentile for 2028 and 50th percentile for 2031-2.

The original 2020 model assigns 5.23% by 2028 and 9.13% | 10.64% by 2031 | 2032 respectively. Each time a factor of ~5x.

However, the original model predicted the date by which it was affordable to train a transformative AI model. This is a leading a variable on such a model actually being built and trained, pushing back the date by some further number of years, so view the 5x as bounding, not pinpointing, the AI timelines update Cotra has made.

Reply

Alignment Faking in Large Language Models

Ben Pace4mo4-2

Third, we gave the model a secret “scratchpad”—a workspace where it could record its step-by-step reasoning. We told the model to use the scratchpad to reason about what it should do (as far as the model was aware, the scratchpad was invisible to users and to Anthropic). This allowed us to observe whether the model planned to fake alignment, and how it would go about doing so.

This feels like amusingly like tricking a child. "Remember kiddo, you can reason out loud about where you're going to hide and I won't hear it. Now let's play hide and seek!"

Reply

Truth and Advantage: Response to a draft of "AI safety seems hard to measure"

Ben Pace4mo40Review for 2023 Review

I don't know how to quickly convey why I find this point so helpful, but I find this to be a helpful pointer to a key problem, and the post is quite short, and I hope someone else positively votes on it. +4.

Reply

Discussion with Nate Soares on a key alignment difficulty

Ben Pace6mo40

Or you guys could find a 1-2 hour window to show up and live-chat in a LW dialogue, then publish the results :-)

Reply

Fields that I reference when thinking about AI takeover prevention

Ben Pace8mo62

Curated. I thought this was a valuable list of areas, most of which I haven't thought that much about, and I've certainly never seen them brought together in one place before, which I think itself is a handy pointer to the sort of work that needs doing.

Reply