I notice that o1's behavior (it's cognitive process) looks suspiciously like human behaviors:
But why is this happening more when o1 can reason more than previous models? Shouldn't ...
I think the four scenarios outlined here roughly map to the areas 1, 6, 7, and 8 of the 60+ Possible Futures post.
Just a data point that support hold_my_fish's argument: Savant Kim Peek did likely memorize gigabytes of information and could access them quite reliably:
I don't think that shards are distinct - neither physically nor logically, so they can't hide stuff in the sense of keeping it out of view of the other shards.
Also, I don't think "querying for plans" is a good summary of what goes on in the brain.
I'm coming more from a brain-like AGI lens, and my account of what goes on would be a bit different. I'm trying to phrase this in shard theory terminology.
First, a prerequisite: Why do Alice's shards generate thoughts that value Rick's state, to begin with? The Risk-shard has learned that actions that...
Some other noteworthy groups in academia lead by people who are somewhat connected to this community:
- Jacob Steinhardt (Berkeley)
- Dylan Hadfield-Menell (MIT)
- Sam Bowman (NYU)
- Roger Grosse (UofT)Some other noteworthy groups in academia lead by people who are perhaps less connected to this community:
- Aleksander Madry (MIT)
- Percy Liang (Stanford)
- Scott Neikum (UMass Amhearst)
Can you provide some links to these groups?
These professors all have a lot of published papers in academic conferences. It’s probably a bit frustrating to not have their work summarized, and then be asked to explain their own work, when all of their work is published already. I would start by looking at their Google Scholar pages, followed by personal websites and maybe Twitter. One caveat would be that papers probably don’t have full explanations of the x-risk motivation or applications of the work, but that’s reading between the lines that AI safety people should be able to do themselves.
Some observations:
Each needs an environment to do so, but the key observation seems to be that a structure is reliably reproduced across intermediate forms (mitosis, babies, language, society) and build on top of each other. It seems plausible that there is a class of formal representations that describe
You don't talk about human analogs of grokking, and that makes sense for a technical paper like this. Nonetheless, grokking also seems to happen in humans, and everybody has had "Aha!" moments before. Can you maybe comment a bit on the relation to human learning? It seems clear that human grokking is not a process that purely depends on the number of training samples seen but also on the availability of hypotheses. People grok faster if you provide them with symbolic descriptions of what goes on. What are your thoughts on the representation and transfer of the resulting structure, e.g., via language/token streams?
Hmm. So firstly, I don't think ML grokking and human grokking having the same name is that relevant - it could just be a vague analogy. And I definitely don't claim to understand neuroscience!
That said, I'd guess there's something relevant about phase changes? Internally, I know that I initially feel very confused, then have some intuition of 'I half see some structure but it's super fuzzy', and then eventually things magically click into place. And maybe there's some similar structure around how phase changes happen - useful explanations get reinforced, a...
I mean scoring thoughts in the sense of [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering with what Steven calls "Thought Assessors". Thoughts totally get scored in that sense.
About the problems you mention:
the apparent phenomenon of credit assignment improving over a lifetime. When you're older and wiser, you're better at noticing which of your past actions were bad and learning from your mistakes.
I don't get why you see a problem here. More data will lead to better models over time. You get exposed to more situations, and with more data, the noise will slowly average out. Not necessarily because you can clearly attribute things to their causes, but because you randomly get into a situation where the effect is more clear. It mo...
The main difference between LDAIXI and a human in terms of ontology seems to be that the things the human values are ultimately grounded in senses and a reward tied to that. For example, we value sweet things because we have a detector for sweetness and a reward tied to that. When our understanding of what sugar is changes the detector doesn't, and thus the ontology change works out fine. But I don't see a reason you couldn't set up LDAIXI the same way: Just specify the reward in terms of a diamond detector - or multiple ones. In the end, there are already detectors that AIXI uses - how else would it get input?
Thank you for mentioning us. In fact, the list of candidate instincts got longer. It isn't in a presentable form yet, but please message me if you want to talk about it.
The list is more theoretical, and I want to prove that this is not just theoretical speculation by operationalizing it. jpyykko is already working on something more on the symbolic level.
Rohin Shaw recommended that I find people to work with me on alignment, and I teamed up with two LWers. We just started work on a project to simulate instinct-cued learning in a toy-world. I think this project fits research point 15.2.1.2, and I wonder now how to apply for funding - we would probably need it if we want to simulate with somewhat larger NNs.
Hi, is there a way to get people in touch with a project or project lead? For example, I'd like to get in touch with Masaharu Mizumoto because iVAIS sounds related to the aintelope project.