NTK training requires training time that scales quadratically with the number of training examples, so it's not usable for large training datasets (nor with data augmentation, since that simulates a larger dataset). (I'm not an NTK expert, but, from what I understand, this quadratic growth is not easy to get rid of.)
That's an interesting question. I don't have an opinion about how much information is stored. Having a lot of capacity appears to be important, but whether that's because it's necessary to store information or for some other reason, I don't know.
It got me thinking, though: the purpose of our brain is to guide our behavior, not to remember our training data. (Whether we can remember our training data seems unclear. Apparently the existence of photographic memory is disputed, but there are people with extraordinarily good memories, even if not photographic.)
It could be that the preprocessing necessary to guide our future behavior unavoidably increases the amount of stored data by a large factor. (There are all sorts of examples of this sort of design pattern in classic computer science algorithms, so it wouldn't be particularly surprising.) If that's the case, I have no idea how to measure how much of it there is.
I figure, at least 10%ish of the cortex is probably mainly storing information which one could also find in a 2022-era large language model (LLM).
This seems to me to be essentially assuming the conclusion. The assumption here is that a 2022 LLM already stores all the information necessary for human-level language ability and that no capacity is needed beyond that. But "how much capacity is required to match human-level ability" is the hardest part of the question.
(The "no capacity is needed beyond that" part is tricky too. I take AI_WAIFU's core point to be that having excess capacity is helpful for algorithmic reasons, even though it's beyond what's strictly necessary to store the information if you were to compress it. But those algorithmic reasons, or similar ones, might apply to AI as well.)
I might as well link my own attempt at this estimate. It's not estimating the same thing (since I'm estimating capacity and you're estimating stored information), so the numbers aren't necessarily in disagreement. My intuition though is that capacity is quite important algorithmically, so it's the more relevant number.
(Edit: Among the sources of that intuition is Neural Tangent Kernel theory, which studies a particular infinite-capacity limit.)
I haven't fully digested this comment, but:
In some sense there's probably no option other than that, since creating a synapse should count as a computational operation. But there'd be different options for what the computations would be.
The simplest might just be storing pairwise relationships. That's going to add size, even if sparse.
I agree that LLMs do that too, but I'm skeptical about claims that LLMs are near human ability. It's not that I'm confident that they aren't--it just seems hard to say. (I do think they now have surface-level language ability similar to humans, but they still struggle at deeper understanding, and I don't know how much improvement is needed to fix that weakness.)