I know this is a bit off topic, but I'd be super fascinated to see what happened if you tried this with a level where the middle hallway had been extended. Strategically, it changes nothing, but it introduces more meaningless steps into a solution. Does this interfere with planning? Or is the planning able to abstract over 'repeat strategically non-relevant step until next decision point'?
Specifically, what if you just duplicated this middle section a bunch.
...One point I’ve seen raised by people in the latter group is along the lines of: “It’s very unlikely that we’ll be in a situation where we’re forced to build AI systems vastly more capable than their supervisors. Even if we have a very fast takeoff - say, going from being unable to create human-level AI systems to being able to create very superhuman systems ~overnight - there will probably still be some way to create systems that are only slightly more powerful than our current trusted systems and/or humans; to use these to supervise and align systems slig
I agree with Steve Byrnes here. I think I have a better way to describe this.
I would say that the missing piece is 'mastery'. Specifically, learning mastery over a piece of reality. By mastery I am referring to the skillful ability to model, predict, and purposefully manipulate that subset of reality.
I don't think this is an algorithmic limitation, exactly.
Look at the work Deepmind has been doing, particularly with Gato and more recently AutoRT, SARA-RT, RT-Trajectory, UniSim , and Q-transformer. Look at the work being done with the help of Nvidia's new Ro...
There’s no sharp line between the helper AIs of Vision 1 and the truly-autonomous AIs of Vision 2.
This post seems like it doesn't quite cleave reality at the joints, from how I'm seeing things.
Vision 1 style models can be turned into Vision 2 autonomous models very easily. So, as you say, there's no sharp line there.
For me, Vision 3 shouldn't depend on biological neurons. I think it's more like 'brain-like AGI that is so brain-like that it is basically an accurate whole brain emulation, and thus you can trust it as much as you can trust a human...
Vision 1 style models can be turned into Vision 2 autonomous models very easily
Sure, Vision 1 models can be turned into dangerous Vision 2 models, but they can’t be turned into good Vision 2 models that we want to have around, unless you solve the different set of problems associated with full-fledged Vision 2. For example, in the narrow value learning vs ambitious value learning dichotomy, “narrow” is sufficient for Vision 1 to go well, but you need “ambitious” for Vision 2 to go well. Right?
...For me, Vision 3 shouldn't depend on biological neurons. I think
I recommend this paper on the subject for additional reading:
The basal ganglia select the expected sensory input used for predictive coding
Ok, so this is definitely not a human thing, so probably a bit of a tangent. One of the topics that came up in a neuroscience class once was goose imprinting. There's apparently been studies (see Eckhard Hess for the early ones) that show that the strength of the imprinting (measured by behavior following the close of the critical period) onto whatever target is related to how much running towards the target the baby geese do. The hand-wavey explanation was something like 'probably this makes sense since if you have to run a lot to keep up with your mother...
I think this is an excellent description of GPT-like models. It both fits with my observations and clarifies my thinking. It also leads me to examine in a new light questions which have been on my mind recently:
What is the limit of power of simulation that our current architectures (with some iterative improvements) can achieve when scaled to greater power (via additional computation, improved datasets, etc)?
Is a Simulator model really what we want? Can we trust the outputs we get from it to help us with things like accelerating alignment research? What might failure modes look like?
Super handy seeming intro for newcomers.
I recommend adding Jade Leung to your list of governance people.
As for the list of AI safety people, I'd like to add that there are some people who've written interesting and much discussed content that it would be worth having some familiarity with.
John Wentworth
Steven Byrnes
Vanessa Kosoy
And personally I'm quite excited about the school of thought developing under the 'Shard theory' banner.
For shard theory info:
https://www.lesswrong.com/posts/xqkGmfikqapbJ2YMj/shard-theory-an-overview
I'm excited to participate in this, and feel like the mental exercise of exploring this scenario would be useful for my education on AI safety. Since I'm currently funded by a grant from the Long Term Future Fund for reorienting my career to AI safety, and feel that this would be a reasonable use of my time, you don't need to pay me. I'd be happy to be a full-time volunteer for the next couple weeks.
Edit: I participated and was paid, but only briefly. Turns out I was too distracted thinking and talking about how the process could be improved and the larger...
I'm potentially interested in the Research Engineer position on the Alignment Team, but I'm currently 3 months into a 6 month grant from LTFF to reorient my career from general machine learning to AI safety specifically. My current plan is to keep doing solo work () until the last month of my grant period then begin applying to AI safety work at places like Anthropic, Redwood Research, Open AI, and Deepmind.
Do you think there's a significant advantage to applying soon vs 3 months from now?
I agree with Joe Carlsmith that this seems like goal guarding.
I would be interested to see if my team's noise-injection technique interferes with these behaviors in a way that makes them easier to detect.
It's worth noting that the alignment faking we see in these experiments is easy to catch by default as we discuss in Appendix E.6. Still, it would be interesting to see if this makes detection even easier or triggers interestingly different catch-to-detect behaviors.
You could try playing with the minimal reproduction on llama-405b.