All of hold_my_fish's Comments + Replies

I haven't fully digested this comment, but:

You mean, cached computations or something?

In some sense there's probably no option other than that, since creating a synapse should count as a computational operation. But there'd be different options for what the computations would be.

The simplest might just be storing pairwise relationships. That's going to add size, even if sparse.

I agree that LLMs do that too, but I'm skeptical about claims that LLMs are near human ability. It's not that I'm confident that they aren't--it just seems hard to say. (I do think t... (read more)

NTK training requires training time that scales quadratically with the number of training examples, so it's not usable for large training datasets (nor with data augmentation, since that simulates a larger dataset). (I'm not an NTK expert, but, from what I understand, this quadratic growth is not easy to get rid of.)

That's an interesting question. I don't have an opinion about how much information is stored. Having a lot of capacity appears to be important, but whether that's because it's necessary to store information or for some other reason, I don't know.

It got me thinking, though: the purpose of our brain is to guide our behavior, not to remember our training data. (Whether we can remember our training data seems unclear. Apparently the existence of photographic memory is disputed, but there are people with extraordinarily good memories, even if not photographic.)... (read more)

5Gunnar Zarncke
Just a data point that support hold_my_fish's argument: Savant Kim Peek did likely memorize gigabytes of information and could access them quite reliably: https://personal.utdallas.edu/~otoole/CGS_CV/R13_savant.pdf 
5Steve Byrnes
You say “Having a lot of capacity appears to be important” but that’s “essentially assuming the conclusion”, right? hehe :) You claim that there’s a lot of capacity, but I say we don’t really know that. As a stupid example, if my computer’s SRAM has N cells, but it uses an error-correcting code by redundantly storing each bit in three different cells, then its “capacity” is ⅓ the number of cells. (And in 6T SRAM, the number of cells is in turn ⅙ the number of transistors, etc.) Anyway, all things considered right now, the most plausible-to-me theory is that counting synapses gives a 2-3OOM overestimate of capacity. I don’t see this as particularly implausible. For one thing, as I wrote in the OP, the synapse is not just an information-storage-unit, it’s also a thing-that-does-calculations. If one bit of stored information (e.g. information about how the world works) needs to be involved in 1000 different calculations, it seems plausible that it would need to be associated with 1000 synapses. For another thing, here’s a model where one functional “connection” requires a group of 10 nearby synapses onto the same dendrite. That’s 1 OOM right there! I think there’s another OOM or two lurking in the fact that each cortical minicolumn is 100 neurons and each cortical column is 100 minicolumns, but there’s some sense in which minicolumns (and to a lesser degree, columns) are a single functional unit. So, without getting into details, which I’m hazy on anyway, I wouldn’t be surprised to learn that “one connection” involved not only 10 nearby synapses on one dendrite, but a similar group on 10 synapses onto a neuron within each of 10 neighboring minicolumns, and those 10 minicolumns are working together to implement a certain kind of computation, which by the way you could trivially do on a GPU in a few clock cycles. Or whatever, I dunno. Or maybe you’re saying “Having a lot of capacity appears to be important” because humans can do things that modern ML can’t, and we nee

I figure, at least 10%ish of the cortex is probably mainly storing information which one could also find in a 2022-era large language model (LLM).

This seems to me to be essentially assuming the conclusion. The assumption here is that a 2022 LLM already stores all the information necessary for human-level language ability and that no capacity is needed beyond that. But "how much capacity is required to match human-level ability" is the hardest part of the question.

(The "no capacity is needed beyond that" part is tricky too. I take AI_WAIFU's core point to b... (read more)

3Steve Byrnes
One of AI_WAIFU’s points was that the brain has some redundancy because synapses randomly fail to fire and neurons randomly die etc. That part wouldn’t be relevant to running the same algorithms on chips, presumably. Then the other thing they said was that over-parameterization helps with data efficiency. I presume that there’s some background theory behind that claim which I’m not immediately familiar with. But I mean, is it really plausible that the brain is overparameterizing by 3+ orders of magnitude? Seems pretty implausible to me, although I’m open to being convinced. Also, Neural Tangent Kernel is an infinite-capacity, but people can do those calculations without using an infinitely large memory, right? People have found a way to reformulate the algorithm such that it involves doing different operations on a different representation which does not require ∞ memory. By the same token, if we’re talking about some network which is so overparametrized that it can be compressed by 99.9%, then I’m strongly inclined to guess that there’s some way to do the same calculations and updates directly on the compressed representation.
4Steve Byrnes
If you think human brains are storing hundreds or thousands of GB or more of information about (themselves / the world / something), do you have any thoughts on what that information is? Like, can you give (stylized) examples? (See also my footnote 13.) Also, see my footnote 16 and surrounding discussion; maybe that’s a crux?