Well, then computing would just take a really long time.
So, it's not impossible in principle if you trained the loss function as I suggested (loss function trained by reinforcement learning, then applied to train the actual novel-generating model), but it is a totally impractical approach.
If you really wanted to teach an AI to generate good novels, you'd probably start by training a LLM to imitate existing novels through some sort of predictive loss (e.g., categorical cross-entropy on next-token prediction) to give it a good prior. Then tr...
With humans in the loop, there actually is a way to implement . Unfortunately, computing the function takes as long as it takes for several humans to read a novel and aggregate their scores. And there's also no way to compute the gradient. So by that point, it's pretty much just a reinforcement learning signal.
However, you could use that human feedback to train a side network to predict the reward signal based on what the AI generates. This second network would then essentially compute a custom loss function (asymptotically approaching with m...
Memory ≠ Unstructured memory (and likewise, locally-random ≠ globally-random): [...]
Agreed. I didn't mean to imply that you thought otherwise.
"just" a memory system + learning algorithm—with a dismissive tone of voice on the "just": [...]
I apologize for how that came across. I had no intention of being dismissive. When I respond to a post or comment, I typically try to frame what I say for a typical reader as much for the original poster. In this case, I had a sense that a typical reader could get the wrong impression about how the neocort...
Great summary of the argument. I definitely agree that this will be an important distinction (learning-from-scratch vs. innate circuitry) for AGI alignment, as well as for developing a useful Theory of Cognition. The core of what motivates our behavior must be innate to some extent (e.g., heuristics that evolution programmed into our hypothalamus that tell us how far from homeostasis we're veering), to act as a teaching signal to the rest of our brains (e.g., learn to generate goal states that minimize the effort required to maintain or return to homeostas...
Great introduction. As someone with a background in computational neuroscience, I'm really looking forward to what you have to say on all this.
By the way, you seem to use a very broad umbrella for covering "brain-like-AGI" approaches. Would you classify something like a predictive coding network as more brain-like or more prosaic? What about a neural Turing machine? In other words, do you have a distinct boundary in mind for separating the red box from the blue box, or is your classification more fuzzy (or does it not matter)?
Thanks!
If I'm not mistaken, the things you brought up are at too low a level to be highly relevant for safety, in my opinion. I guess this series will mostly be at Marr's "computational level" whereas you're asking "algorithm-level" questions, more or less. I'll be talking a lot about things vaguely like "what is the loss function" and much less about things vaguely like "how is the loss function minimized".
For example, I think you can train a neural Turing machine with supervised learning, or you can train a neural Turing machine with RL. That distinction...
The monotonicity principle requires it to be non-decreasing w.r.t. the manifesting of less facts. Roughly speaking, the more computations the universe runs, the better.
I think this is what I was missing. Thanks.
So, then, the monotonicity principle sets a baseline for the agent's loss function that corresponds to how much less stuff can happen to whatever subset of the universe it cares about, getting worse the fewer opportunities become available, due to death or some other kind of stifling. Then the agent's particular value function over universe-states gets added/subtracted on top of that, correct?
Could you explain what the monotonicity principle is, without referring to any symbols or operators? I gathered that it is important, that it is problematic, that it is a necessary consequence of physicalism absent from cartesian models, and that it has something to do with the min-(across copies of an agent) max-(across destinies of the agent copy) loss. But I seem to have missed the content and context that makes sense of all of that, or even in what sense and over what space the loss function is being monotonic.
Your discussion section is good. I would l...
So, from what I read, it looks like ACT-R is mostly about modeling which brain systems are connected to which and how fast their interactions are, not in any way how the brain systems actually do what they do. Is that fair? If so, I could see this framework helping to set useful structural priors for developing AGI (e.g., so we don't make the mistake of hooking up the declarative memory module directly to the raw sensory or motor modules), but I would expect most of the progress still to come from research in machine learning and computational neuroscience.
Could part of the problem be that the actor is optimizing against a single grader's evaluations? Shouldn't it somehow take uncertainty into account?
Consider having an ensemble of graders, each learning or having been trained to evaluate plans/actions from different initializations and/or using different input information. Each grader would have a different perspective, but that means that the ensemble should converge on similar evaluations for plans that look similarly good from many points of view (like a CT image crystallizing from the combination of man... (read more)