paulfchristiano — AI Alignment Forum

Announcing the ARC White-Box Estimation Challenge

by Jacob_Hilton, paulfchristiano, and Wilson Wu

ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge, a contest to improve upon our estimation algorithms for random MLPs. The warm-up round begins this week, and later rounds will have a total prize pool of at least $100,000. We are very grateful to Sharada Mohanty,...

Jun 2164

Mechanistic estimation for expectations of random products

by Jacob_Hilton, George Robinson, Eric Neyman, paulfchristiano, Mikewins, Victor Lecomte, Wilson Wu, and Gabriel Wu

We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as expectations of random products. This includes several different estimation problems, such as random halfspace intersections, random #3-SAT and random permanents. In this post, we will give a high-level introduction to...

May 1550

Thoughts on responsible scaling policies and regulation

I am excited about AI developers implementing responsible scaling policies; I’ve recently been spending time refining this idea and advocating for it. Most people I talk to are excited about RSPs, but there is also some uncertainty and pushback about how they relate to regulation. In this post I’ll explain...

Oct 24, 2023220

Thoughts on sharing information about language model capabilities

Core claim I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI—despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular). Concretely,...

Jul 31, 2023211

ARC is hiring theoretical researchers

The Alignment Research Center’s Theory team is starting a new hiring round for researchers with a theoretical background. Please apply here. Update January 2024: we have paused hiring and expect to reopen in the second half of 2024. We are open to expressions of interest but do not plan to...

Jun 12, 2023126

Prizes for matrix completion problems

Here are two self-contained algorithmic questions that have come up in our research. We're offering a bounty of $5k for a solution to either of them—either an algorithm, or a lower bound under any hardness assumption that has appeared in the literature. Question 1 (existence of PSD completions): given m...

May 3, 2023164

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

by Andrea_Miotti, paulfchristiano, Gabriel Alfour, and Olive Branch

The following are the summary and transcript of a discussion between Paul Christiano (ARC) and Gabriel Alfour, hereafter GA (Conjecture), which took place on December 11, 2022 on Slack. It was held as part of a series of discussions between Conjecture and people from other organizations in the AGI and...

Feb 24, 202361

Paul Christiano

Paul Christiano

Paul Christiano

Where I agree and disagree with Eliezer

What failure looks like

AI alignment is distinct from its near-term applications

Another (outer) alignment failure story

Paul Christiano

Where I agree and disagree with Eliezer

What failure looks like

AI alignment is distinct from its near-term applications

Another (outer) alignment failure story

Announcing the ARC White-Box Estimation Challenge

Mechanistic estimation for expectations of random products

Thoughts on responsible scaling policies and regulation

Thoughts on sharing information about language model capabilities

ARC is hiring theoretical researchers

Prizes for matrix completion problems

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes