Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
Yes there are, sort of...
You can apply to as many projects as you want, but you can only join one team.
The reasons for this is: When we've let people join more than one team in the past, they usually end up not having time for both and dropping out of one of the projects.
What this actually means:
When you join a team you're making a promise to spend 10 or more hours per week on that project. When we say you're only allowed to join one team, what we're saying is that you're only allowed to make this promise to one project.
However, you are allowed to help out other teams with their projects, even if you're not officially on the team.
@Samuel Nellessen
Thanks for answering Gunnars question.
But also, I'm a bit nervous that posting their email here directly in the comments is too public, i.e. easy for spam-bots to find.
If the research lead want to be contactable, their contact info is in their projekt document, under the "Team" section. Most (or all, I'm not sure) research leads have some contact info.
Yesterday was the official application deadline for leading a project at the next AISC. This means that we just got a whole host of project proposals.
If you're interested in giving feedback and advise to our new research leads, let me know. If I trust your judgment, I'll onboard you as an AISC advisor.
Also, it's still possible to send us a late AISC project proposals. However we will prioritise people how applied in time when giving support and feedback. Further more, we'll prioritise less late applications over more late applications.
At this writing www.aisafety.camp goes to our new website while aisafety.camp goes to our old website. We're working on fixing this.
If you want to spread information about AISC, please make sure to link to our new webpage, and not the old one.
Thanks!
I have two hypothesises for what is going on. I'm leaning towards 1, but very unsure.
1)
king - man + woman = queen
is true for word2vec embeddings but not in LLaMa2 7B embeddings because word2vec has much fewer embedding dimensions.
Possibly when you have thousands of embedding dimensions, these dimensions will encode lots of different connotations of these words. These connotations will probably not line up with the simple relation [king - man + woman = queen], and therefore we get [king - man + woman queen] for high dimensional embeddings.
2)
king - man + woman = queen
Isn't true for word2vec either. If you do it with word2vec embeddings you get more or less the same result I did with LLaMa2 7B.
(As I'm writing this, I'm realising that just getting my hands on some word2vec embeddings and testing this for myself, seems much easier than to decode what the papers I found is actually saying.)
"▁king" - "▁man" + "▁woman" "▁queen" (for LLaMa2 7B token embeddings)
I tired to replicate the famous "king" - "man" + "woman" = "queen" result from word2vec using LLaMa2 token embeddings. To my surprise it dit not work.
I.e, if I look for the token with biggest cosine similarity to "▁king" - "▁man" + "▁woman" it is not "▁queen".
Top ten cosine similarly for
"▁queen" is the closest match only if you exclude any version of king and woman. But this seems to be only because "▁queen" is already the 2:nd closes match for "▁king". Involving "▁man" and "▁woman" is only making things worse.
I then tried looking up exactly what the word2vec result is, and I'm still not sure.
Wikipedia sites Mikolov et al. (2013). This paper is for embeddings from RNN language models, not word2vec, which is ok for my purposes, because I'm also not using word2vec. More problematic is that I don't know how to interpret how strong their results are. I think the relevant result is this
We see that the RNN vectors capture significantly more syntactic regularity than the LSA vectors, and do remarkably well in an absolute sense, answering more than one in three questions correctly.
which don't seem very strong. Also I can't find any explanation of what LSA is.
I also found this other paper which is about word2vec embeddings and have this promising figure
But the caption is just a citation to this third paper, which don't have that figure!
I've not yet read the two last papers in detail, and I'm not sure if or when I'll get back to this investigation.
If someone knows more about exactly what the word2vec embedding results are, please tell me.
I don't think seeing it as a one dimensional dial, is a good picture here.
The AI has lots and lots of sub-circuits, and many* can have more or less self-other-overlap. For “minimal self-other distinction while maintaining performance” to do anything, it's sufficient that you can increase self-other-overlap in some subset of these, without hurting performance.
* All the circuits that has to do with agent behaviour, or beliefs.
You can find their prefeed contact info in each document in the Team section.