Are there already plans for a transcript of this? (I could set in motion of a rev.com transcription)
No plans in motion. Thank you very much if you decide to do so! Also, you might want to message Rob to get the images.
Here is a link to the transcript, which includes ability to watch along with the video.
https://www.rev.com/transcript-editor/shared/QmH6Ofy5AXbQ4siBlLNcvUnMkMBj3qa4WIkQtGeoOlo4K3DvjOH3oMUJuIAUBrJiJkJbb4VU3uqWhLLwRu19f3m6gag?loadFrom=SharedLink
How do transcriptions typically handle images? They're pretty important for this talk. You could embed the images in the text as it progresses?
Thanks a bunch!
I second Rob's unanswered question at 40:12: how is that we ever accomplish anything in practice, if the search space is vast, and things that both work and look like they work are exponentially rare?
How is the "the genome is small, therefore generators of human values (that can't be learned from the environment) are no more complex than tens or hundreds of things on the order of a fuzzy face detector" argument compatible with the complexity of value thesis, or does it contradict it?
how is that we ever accomplish anything in practice, if the search space is vast, and things that both work and look like they work are exponentially rare?
This question needs a whole essay (or several) on its own. If I don't get around to leaving a longer answer in the next few days, ping me.
Meanwhile, if you want to think it through for yourself, the general question is: where the hell do humans get all their bits-of-search from?
How is the "the genome is small, therefore generators of human values (that can't be learned from the environment) are no more complex than tens or hundreds of things on the order of a fuzzy face detector" argument compatible with the complexity of value thesis, or does it contradict it?
The key difference is between "human values" vs "generators of human values". The complexity of value thesis (as articulated on that arbital page) says that human values are not algorithmically simple, and I do agree with that. But that still allows for simple generators of human values, which (conceptually) take in lots of data from the real world and spit out values. Everything except those generators is learned from the environment.
In principle, if we can figure out those relatively-simple generators, then we can feed an AI data similar to the data from which humans' value-generators generate their values, and the AI should be able to reconstruct human values (up to within ordinary between-humans-within-similar-environments variation).
Meanwhile, if you want to think it through for yourself, the general question is: where the hell do humans get all their bits-of-search from?
Cultural accumulation and google, but that's mimicking someone who's already figured it out. How about the person who first figured out eg crop growth? Could be scientific method, but also just random luck which then caught on.
Additionally, sometimes it's just applying the same hammers to different nails or finding new nails, which means that there are general patterns (hammers) that can be applied to many different situations. There's bits of information in both the patterns themselves and when to apply them, though I feel confused trying to connect these ideas here.
People specifically have inner simulations (ie you can imagine what it'd look like to drop a bowling ball off a building even if you've never seen it) from things you have lots of experience with is a way of applying different patterns to new situations.
I think a lot of the values we care about are cultural, not just genetic. A human raised without culture isn't even clearly going to be generally intelligent (in the way humans are), so why assume they'd share our values?
Estimations of the information content of this part are discussed by Eric Baum in What is Thought?, although I do not recall the details.
I find that plausible, a priori. Mostly doesn't affect the stuff in the talk, since that would still come from the environment, and the same principles would apply to culturally-derived values as to environment-derived values more generally. Assuming the hardwired part is figured out, we should still be able to get an estimate of human values within the typical-human-value-distribution-for-a-given-culture from data which is within the typical-human-environment-distribution-for-that-culture.
Thinking through the "vast majority of problem-space for X fails" argument; assume we have a random text generator that we want to run a sorting algorithm:
For programs specifically, if it's simple and passes a relevant distribution of unit tests, we can be highly confident it in fact sorts correctly, but what's the equivalent for "plan that maintains human values"? Let's say John succeeds and finds what we think to be the generators of human values, would it be comprehensible enough to verify it?
Applying the argument again but to John's proposed solution, the vast majority of [Ai's trained in human environments with what we think are the simple generators of human values]'s plans & behaviors may look good but not actually be good. Or the weights are incomprehensible, so we use unit tests to verify and it could still fail.
Counter-counterargument: I can imagine these generators being simple enough that we can indeed be confident they do what we want. Since it should be human-value-equivalent, it should also be human-interpretable (under reflection?).
This sounds like a good idea overall, but I wouldn't bet my life on it. It'd be nice to have necessary and sufficient conditions for this possible solution.
Cheers for posting! I've got a question about the claim that optimizers compress by default, due to the entropy maximization-style argument given around 20:00 (apologies if you covered this, it's not easy to check back through a video):
Let's say that we have a neural network of width 100, which is trained on a dataset which could be trained to perfect accuracy on a network of width of only 30. If it compresses it into only 30 weights there's a 70-dimensional space of free parameters and we should expect a randomly selected solution to be of this kind.
I agree that if we randomly sample zero-loss weight configurations, we end up with this kind of compression, but it seems that any kind of learning we know how to do is dependent on the paths that one can take to reach it, and that abstracting this away can give very different results to any high-dimensional optimization that we actually know how to do.
Assuming that the network is parameterized by, say, float16s, maximal compression of the data would result in the output of the network being sensitive to the final bit of the weights in as many cases as possible, thereby leaving the largest number of free bits, so 16 bits of info would be compressed in to one weight, rather than spread among 3-4.
My intuition is that these highly compressed arrangements would be very sensitive to perturbations, and render them incredibly difficult to reach in practice (and also have a big problem with an unknown examples, and are therefore screened off by techniques like dropout and regularization). There is therefore a competing incentive towards minima which are easy to land on - probably flat minima surrounded by areas of relatively good performance. Further, I expect that these kind of minima tend to leverage the whole network for redundancy and flatness (not needing to depend tightly on the final bit of weights).
The properties of would be not just compression but some combination of compression and smoothness (smoothness being sort of a variant of compression where the final bits don't matter much) which would not result in some subset of the parameters having all the useful information.
If you agree that this is what happens, in what sense is there really compression, if the info is spread among multiple bits? Perhaps given the structure of NNs, we should expect to be able to compress by removing the last bits of weights as these are the easiest to leave free given the structure of training?
If you disagree I'd be curious to know where. I sense that Mingard et al shares your conclusion but I don't yet understand the claimed empirical demonstration.
tldr: optimization may compress by default, but learning seems to counteract this by choosing easy-to-find minima.
it seems that any kind of learning we know how to do is dependent on the paths that one can take to reach it, and that abstracting this away can give very different results to any high-dimensional optimization that we actually know how to do.
This is where Mingard et al come in. One of their main results is that SGD training on neural nets does quite well approximate just-randomly-sampling-an-optimal-point. Turns out our methods are not actually very path-dependent in practice!
My intuition is that these highly compressed arrangements would be very sensitive to perturbations, and render them incredibly difficult to reach in practice... There is therefore a competing incentive towards minima which are easy to land on - probably flat minima surrounded by areas of relatively good performance.
There is a mismatch between your intuition and the implications of "flat minima surrounded by areas of relatively good performance".
Remember, the whole point of the "highly compressed arrangements" is that we only need to lock in a few parameter values in order to get optimal behavior; once those few values are locked in, the rest of the parameters can mostly vary however they want without screwing stuff up. "Flat minimum surrounded by areas of relatively good performance" is synonymous with compression: if we can vary the parameters in lots of ways without losing much performance, that implies that all the info needed for optimal performance has been compressed into whatever-we-can't-vary-without-losing-performance.
Now, your intuition is correct in the sense that info may be spread over many parameters; the relevant "ways to vary things" may not just be "adjust one param while holding others constant". For instance, it might be more useful to look at parameter variation along local eigendirections of the Hessian. Then the claim would be something like "flat optimum = performance is flat along lots of eigendirections, therefore we can project the parameter-values onto the non-flat eigendirections and those projections are the 'compressed info'". (Tbc, I still don't know what the best way is to characterize this sort of thing, but eigendirections are an obvious approximation which will probably work.)
Turns out our methods are not actually very path-dependent in practice!
Yeah I get that's what Mingard et al are trying to show but the meaning of their empirical results isn't clear to me - but I'll try and properly read the actual paper rather than the blog post before saying any more in that direction.
"Flat minimum surrounded by areas of relatively good performance" is synonymous with compression. if we can vary the parameters in lots of ways without losing much performance, that implies that all the info needed for optimal performance has been compressed into whatever-we-can't-vary-without-losing-performance.
I get that a truly flat area is synonymous with compression - but I think being surrounded by areas of good performance is anti-correlated with compression because it indicates redundancy and less-than-maximal sensitivity.
I agree that viewing it as flat eigendimensions in parameter space is the right way to think about it, I still worry that the same concerns apply that maximal compression in this space is traded against ease of finding what would be a flat plain in many dimensions, but a maximally steep ravine in all of the other directions. I can imagine this could be investigated with some small experiments, or they may well already exist but I can't promise I'll follow up, if anyone is interested let me know.
Thanks a lot for posting this! A minor point about the 2nd intuition pump (100-timesteps, 4 actions: Take $1, Do Nothing, Buy Apple, Buy Banana; the point being that most action sequences take the Take $1 action a lot rather than the Do Nothing action): the "goal" of getting 3 apples seems irrelevant to the point, and may be misleading if you think that that goal is where the push to acquire resources comes from. A more central source seems to me to be the "rule" of not ending with a negative balance: this is what prunes paths through the tree that contain more "do nothing" actions.
Yup! More generally, key pieces for modeling a "resource": amounts of the resource are additive, and more resources open up more actions (operationalized by the need for a positive balance in this case). If there's something roughly like that in the problem space, then the resource-seeking argument kicks in.
Regarding generators of human values: say we have the gene information that encodes human cognition, what does that mean? Equivalent of a simulated human? Capabilities secret-sauce algorithm right? I'm unsure if you can take the body out of a person and still have the same values because I have felt senses in my body that tells me information about the world and how I relate to it.
Assume it works as a simulated person and ignore mindcrime, how do you algorithmically end up in a good enough subset of human values (because not all human values are meta-good)? Or, how do you use this to create a simulated long reflection? (ie what humans would decide ethics to be if they thought about it for [1000] years)
You could first figure out meta-preferences and bootstrap that in for figuring out preferences. Though, I'm unsure if there are a "correct" set of meta-preferences, with my main confusion being the blank spot in my map where "enlightenment" is.
I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don't currently know of any good writeup. Major pieces in part one:
Note that I don't talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.
Here's the video for part one:
Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.