AI ALIGNMENT FORUM
AF

All of Stephen McAleese's Comments + Replies

My AGI safety research—2024 review, ’25 plans

3mo*61

This is brilliant work, thank you. It's great that someone is working on these topics and they seem highly relevant to AGI alignment.

One intuition for why a neuroscience-inspired approach to AI alignment seems promising is that apparently a similar strategy worked for AI capabilities: the neural network researchers from the 1980s who tried to copy how the brain works using deep learning were ultimately the most successful at building highly intelligent AIs (e.g. GPT-4) and more synthetic approaches (e.g. pure logic) were less successful.

Similarly, we alrea... (read more)

More people getting into AI safety should do a PhD

Stephen McAleese

1y10

I think this section of the post is slightly overstating the opportunity cost of doing a PhD. PhD students typically spend most of their time on research so ideally, they should be doing AI safety research during the PhD (e.g. like Stephen Casper). If the PhD is in an unrelated field or for the sake of upskilling then there is a more significant opportunity cost relative to working directly for an AI safety organization.

Stephen McAleese

1y10

Thank you for explaining PPO. In the context of AI alignment, it may be worth understanding in detail because it's the core algorithm at the heart of RLHF. I wonder if any of the specific implementation details of PPO or how it's different from other RL algorithms have implications for AI alignment. To learn more about PPO and RLHF, I recommend reading this paper: Secrets of RLHF in Large Language Models Part I: PPO.

LLMs for Alignment Research: a safety priority?

Stephen McAleese

1y11-1

LLMs aren't that useful for alignment experts because it's a highly specialized field and there isn't much relevant training data. The AI Safety Chatbot partially solves this problem using retrieval-augmented generation (RAG) on a database of articles from https://aisafety.info. There also seem to be plans to fine-tune it on a dataset of alignment articles.

2Abram Demski1y

Sounds pretty cool! What LLM powers it?

3Ryan Greenblatt1y

Seems plausibly true for the alignment specific philosophy/conceptual work, but many people attempting to improve safety also end up doing large amounts of relatively normal work in other domains (ML, math, etc.) The post is more centrally talking about the very alignment specific use cases of course.

Refusal mechanisms: initial experiments with Llama-2-7b-chat

Stephen McAleese

1y30

Nice post! The part I found most striking was how you were able to use the mean difference between outputs on harmful and harmless prompts to steer the model into refusing or not. I also like the refusal metric which is simple to calculate but still very informative.

Stephen McAleese

1y30

TL;DR: Private AI companies such as Anthropic which have revenue-generating products and also invest heavily in AI safety seem like the best type of organization for doing AI safety research today. This is not the best option in an ideal world and maybe not in the future but right now I think it is.

I appreciate the idealism and I'm sure there is some possible universe where shutting down these labs would make sense but I'm quite unsure about whether doing so would actually be net-beneficial in our world and I think there's a good chance it would be net-neg... (read more)

There should be more AI safety orgs

Stephen McAleese

2y*11

Thanks for the post! I think it does a good job of describing key challenges in AI field-building and funding.

The talent gap section describes a lack of positions in industry organizations and independent research groups such as SERI MATS. However, there doesn't seem to be much content on the state of academic AI safety research groups. So I'd like to emphasize the current and potential importance of academia for doing AI safety research and absorbing talent. The 80,000 Hours AI risk page says that there are several academic groups working on AI safety inc... (read more)

Could We Automate AI Alignment Research?

Stephen McAleese

2y10

Some related posts on automating alignment research I discovered recently:

Could We Automate AI Alignment Research?

Stephen McAleese

2y20

I agree that the difficulty of the alignment problem can be thought of as a diagonal line on the 2D chart above as you described.

This model may make having two axes instead of one unnecessary. If capabilities and alignment scale together predictably, then high alignment difficulty is associated with high capabilities, and therefore the capabilities axis could be unnecessary.

But I think there's value in having two axes. Another way to think about your AI alignment difficulty scale is like a vertical line in the 2D chart: for a given level of AI capability (... (read more)

AGI is easier than robotaxis

Stephen McAleese

2y40

I highly recommend this interview with Yann LeCun which describes his view on self-driving cars and AGI.

Basically, he thinks that self-driving cars are possible with today's AI but would require immense amounts of engineering (e.g. hard-wired behavior for corner cases) because today's AI (e.g. CNNs) tends to be brittle and lacks an understanding of the world.

My understanding is that Yann thinks we basically need AGI to solve autonomous driving in a reliable and satisfying way because the car would need to understand the world like a human to drive reliably.

AXRP Episode 24 - Superalignment with Jan Leike

Stephen McAleese

2y20

My summary of the podcast

Introduction

The superalignment team is OpenAI’s new team, co-led by Jan Leike and Ilya Sutskever, for solving alignment for superintelligent AI. One of their goals is to create a roughly human-level automated alignment researcher. The idea is that creating an automated alignment researcher is a kind of minimum viable alignment project where aligning it is much easier than aligning a more advanced superintelligent sovereign-style system. If the automated alignment researcher creates a solution to aligning superintelligence, Ope... (read more)

Alignment Grantmaking is Funding-Limited Right Now

Stephen McAleese

2y11

That seems like a better split and there are outliers of course. But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output, more marketing, and the ability to organize events. An individual is like an organization with one employee.

Linda Linsefors

2y75

But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output,

I think your getting the causality backwards. You need money first, before there is an org. Unless you count informal multi people collaborations as orgs.

I think people how are more well-known to grant-makers are more likely to start orgs. Where as people who are less known are more likely to get funding at all, if they aim for a smaller garant, i.e. as an independent researcher.

Alignment Grantmaking is Funding-Limited Right Now

Stephen McAleese

2y44

This sounds more or less correct to me. Open Philanthropy (Open Phil) is the largest AI safety grant maker and spent over $70 million on AI safety grants in 2022 whereas LTFF only spent ~$5 million. In 2022, the median Open Phil AI safety grant was $239k whereas the median LTFF AI safety grant was only $19k in 2022.

Open Phil and LTFF made 53 and 135 AI safety grants respectively in 2022. This means the average Open Phil AI safety grant in 2022 was ~$1.3 million whereas the average LTFF AI safety grant was only $38k. So the average Open Phil AI safety grant... (read more)

3Linda Linsefors2y

Counter point. After the FTX collapse, OpenPhil said publicly (some EA Forum post) that they where raising their bar for funding. I.e. there are things that would have been funded before that would now not be funded. The stated reason for this is that there are generally less money around, in total. To me this sounds like the thing you would do if money is the limitation. I don't know why OpenPhil don't spend more. Maybe they have long timelines and also don't expect any more big donors any time soon? And this is why they want to spend carefully?

2Garrett Baker2y

A priori, and talking with some grant-makers, I'd think the split would be around people & orgs who are well-known by the grant-makers, and those who are not well-known by the grant-makers. Why do you think the split is around people vs orgs?