We are pleased to announce that the 10th version of the AI Safety Camp is now entering the team member application phase!
AI Safety Camp is a 3-month long online research program from January to April 2025, where participants form teams to work on pre-selected projects.
We have a wide range of projects this year again, so check them out to see if you or someone you know might be interested in applying to join one of them.
You can find all of the projects and the application form on our website, or directly apply here. The deadline for team member applications is November 17th (Sunday).
Below, we are including the categories and summaries of all the projects that will run in AISC 10.
Project Lead: Chris Gerrby
This project...
Two thoughts about the role of quining in IBP:
Despite the current popularity of machine learning, I haven’t found any short introductions to it which quite match the way I prefer to introduce people to the field. So here’s my own. Compared with other introductions, I’ve focused less on explaining each concept in detail, and more on explaining how they relate to other important concepts in AI, especially in diagram form. If you're new to machine learning, you shouldn't expect to fully understand most of the concepts explained here just after reading this post - the goal is instead to provide a broad framework which will contextualise more detailed explanations you'll receive from elsewhere.
I'm aware that high-level taxonomies can be controversial, and also that it's easy to fall into the illusion of transparency when trying to...
instead deep learning tends to generalise incredibly well to examples it hasn’t seen already. How and why it does so is, however, still poorly-understood.
In my opinion generalisation is a very interesting point!
Are there any new insights into deep learning generalisation, similar to the ideas of:
1) implicit regularisation through optimisation methods like stochastic gradient descent,
2) the double descent risk curve where more parameters can reduce error again,
or
3) margin-based measures to predict generalisation gaps?
Or more generall...
This is a YouTube playlist of recorded lectures on the learning-theoretic AI alignment agenda (LTA) I gave for my MATS scholars of the Winter 2024 cohort, edited by my beloved spouse @Marcus Ogren. H/t William Brewer for helping with the recording, and the rest of the MATS team for making this possible.
I hope these will become a useful resource for anyone who wants to get up to speed on the LTA, complementary to the reading list. Notable topics that aren't covered include metacognitive agents (although there is an older recorded talk on that) and infra-Bayesian physicalism. In the future, I might record more lectures to expand this playlist.
EDIT: I know the audio quality is bad, and I apologize. I will try to do better next time.
ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our motivation for this is an analogy between backdoors and deceptive alignment, the possibility that an AI system would intentionally behave well in training in order to give itself the opportunity to behave uncooperatively later. In our paper, we prove several theoretical results that shed some light on possible mitigations for deceptive alignment, albeit in a way that is limited by the strength of this analogy.
In this post, we will:
For those who are interested in the mathematical details, but would like something more accessible than the paper itself, see this talk I gave about the paper:
Yes there are, sort of...
You can apply to as many projects as you want, but you can only join one team.
The reasons for this is: When we've let people join more than one team in the past, they usually end up not having time for both and dropping out of one of the projects.
What this actually means:
When you join a team you're making a promise to spend 10 or more hours per week on that project. When we say you're only allowed to join one team, what we're saying is that you're only allowed to make this promise to one project.
However, you are allowed to help out other teams with their projects, even if you're not officially on the team.