See also this post by Quintin Pope: https://www.lesswrong.com/posts/JFibrXBewkSDmixuo/hypothesis-gradient-descent-prefers-general-circuits
I think that post has a lot of good ideas, e.g. the idea that generalizing circuits get reinforced by SGD more than memorizing circuits at least rhymes with what we claim is actually going on (that generalizing circuits are more efficient at producing strong logits with small param norm). We probably should have cited it, I forgot that it existed.
But it is ultimately a different take and one that I think ends up being wrong (e.g. I think it would struggle to explain semi-grokking).
I also think my early explanation, which that post compares to, is basically...
Interesting. This prank seems to be one you could play on a Logical Inductor, I wonder what the outcome would be? One fact that's possibly related is that computable functions are continuous. This would imply that whatever computable function Omega applies to your probability estimate, there exists a fixed point probability you can choose where you'll be correct about the monkey probability. Of course if you're a bounded agent thinking for a finite amount of time, you might as well be outputting rational probability estimates, in which case functions like ...
Counterargument 2 still seems correct to me.
Techniques like Density Functional Theory give pretty-accurate results for molecular systems in an amount of time far less than a full quantum mechanical calculation would take. While in theory quantum computing is a complexity class beyond what classical computers can handle, in practice it seems that it's possible to get quite good results, even on a classical computer. The hardness of simulating atoms and molecules looks like it depends heavily on progress in algorithms, rather than being based on the hardness...
This is really cool, thanks for posting it. I also would not have expected this result. In particular, the fact that the top right vector generalizes across mazes is surprising. (Even generalizing across mouse position but not maze configuration is a little surprising, but not as much.)
Since it helps to have multiple interpretations of the same data, here's an alternative one: The top right vector is modifying the neural network's perception of the world, not its values. Let's say the agent's training process has resulted in it valuing going up and to the ...
Thanks for the link! Looks like they do put optimization effort into choosing the subspace, but it's still interesting that the training process can be factored into 2 pieces like that.
I find the prospect of training on model on just 40 parameters to be very interesting. Almost unbelievable, really, to the point where I'm tempted to say: "I notice that I'm confused". Unfortunately, I don't have access to the paper and it doesn't seem to be on sci-hub, so I haven't been able to resolve my confusion. Basically, my general intuition is that each parameter in a network probably only contributes a few bits of optimization power. It can be set fairly high, fairly low, or in between. So if you just pulled 40 random weigh...
On training AI systems using human feedback: This is way better than nothing, and it's great that OpenAI is doing it, but has the following issues:
I took Nate to be saying that we'd compute the image with highest faceness according to the discriminator, not the generator. The generator would tend to create "thing that is a face that has the highest probability of occurring in the environment", while the discriminator, whose job is to determine whether or not something is actually a face, has a much better claim to be the thing that judges faceness. I predict that this would look at least as weird and nonhuman as those deep dream images if not more so, though I haven't actually tried it. I also predic...
7: Did I forget some important question that someone will ask in the comments?
Yes!
Is there a way to deal with the issue of there being multiple ROSE points in some games? If Alice says "I think we should pick ROSE point A" and Bob says "I think we should pick ROSE point B", then you've still got a bargaining game left to resolve, right?
Anyways, this is an awesome post, thanks for writing it up!
Posting this comment to start some discussion about generalization and instrumental convergence (disagreements #8 and #9).
So my general thoughts here are that ML generalization is almost certainly not good enough for alignment. (At least in the paradigm of deep learning.) I think it's true with high confidence that if we're trying to train a neural net to imitate some value function, and that function takes a high-dimensional input, then it will be possible to find lots of inputs that cause the network to produce a high value when the value function produc...
Yes, sounds right to me. It's also true that one of the big unproven assumptions here is that we could create an AI strong enough to build such a tool, but too weak to hack humans. I find it plausible, personally, but I don't yet have an easy-to-communicate argument for it.
Okay, I will try to name a strong-but-checkable pivotal act.
(Having a strong-but-checkable pivotal act doesn't necessarily translate into having a weak pivotal act. Checkability allows us to tell the difference between a good plan and a trapped plan with high probability, but the AI has no reason to give us a good plan. It will just produce output like "I have insufficient computing power to solve this problem" regardless of whether that's actually true. If we're unusually successful at convincing the AI our checking process is bad when it's actually good,...
Well, I had to think about this for longer than five seconds, so that's already a huge victory.
If I try to compress your idea down to a few sentences:
The humans ask the AI to produce design tools, rather than designs, such that there's a bunch of human cognition that goes into picking out the particular atomic arrangements or synthesis pathways; and we can piecewise verify that the tool is making accurate predictions; and the tool is powerful enough that we can build molecular nanotech and an uploader by using the tool for an amount of time too short for F...
Thanks for writing this. I agree with all of these except for #30, since it seems like checking the output of the AI for correctness/safety should be possible even if the AI is smarter than us, just like checking a mathematical proof can be much easier than coming up with the proof in the first place. It would take a lot of competence, and a dedicated team of computer security / program correctness geniuses, but definitely seems within human abilities. (Obviously the AI would have to be below the level of capability where it can just write down an argument...
We might summarise this counterargument to #30 as "verification is easier than generation". The idea is that the AI comes up with a plan (+ explanation of how it works etc.) that the human systems could not have generated themselves, but that human systems can understand and check in retrospect.
Counterclaim to "verification is easier than generation" is that any pivotal act will involve plans that human systems cannot predict the effects of just by looking at the plan. What about the explanation, though? I think the problem there may be more that we don't ...
The Ultimatum game seems like it has pretty much the same type signature as the prisoner's dilemma: Payoff matrix for different strategies, where the players can roll dice to pick which strategy they use. Does timeless decision theory return the "correct answer" (second player rejects greedy proposals with some probability) when you feed it the Ultimatum game?
Large genomes have (at least) 2 kinds of costs. The first is the energy and other resources required to copy the genome whenever your cells divide. The existence of junk DNA suggests that this cost is not a limiting factor. The other cost is that a larger genome will have more mutations per generation. So maintaining that genome across time uses up more selection pressure. Junk DNA requires no maintenance, so it provides no evidence either way. Selection pressure cost could still be the reason why we don't see more knowledge about the world being translate...
Good point, I wasn't thinking about that mechanism.
However, I don't think this creates an information bottleneck in the sense needed for the original claim in the post, because the marginal cost of storing more information in the genome does not increase via this mechanism as the amount-of-information-passed increases. Each gene just needs to offer a large enough fitness advantage to counter the noise on that gene; the requisite fitness advantage does not change depending on whether the organism currently has a hundred information-passing genes or a hundre...
I don't know if the entry is intended to cover speculative issues, but if so, I think it would be important to include a discussion of what happens when we start building machines that are generally intelligent and have internal subjective experiences. Usually with present day AI ethics concerns, or AI alignment, we're concerned about the AI taking actions that harm humans. But as our AI systems get more sophisticated, we'll also have to start worrying about the well-being of the machines themselves.
Should they have the same rights as humans? Is it ethical...
That's very cool, thanks for making it. At first I was worried that this meant that my model didn't rely on selection effects. Then I tried a few different random seeds, and some, like 1725, didn't show demon-like behaviour. So I think we're still good.
No regularization was used.
I also can't see any periodic oscillations when I zoom in on the graphs. I think the wobbles you are observing in the third phase are just a result of the random noise that is added to the gradient at each step.
Thanks, and your summary is correct. You're also right that this is a pretty contrived model. I don't know exactly how common demons are in real life, and this doesn't really shed much light on that question. I mainly thought that it was interesting to see that demon formation was possible in a simple situation where one can understand everything that is going on.
Thanks. I initially tried putting the code in a comment on this post, but it ended up being deleted as spam. It's now up on github: https://github.com/DaemonicSigil/tessellating-hills It isn't particularly readable, for which I apologize.
The initial vector has all components set to 0, and the charts show the evolution of these components over time. This is just for a particular run, there isn't any averaging. x0 gets its own chart, since it changes much more than the other components. If you want to know how the loss varies with time, you can just flip fig
...Here is the code for people who want to reproduce these results, or just mess around:
import torch
import numpy as np
import matplotlib.pyplot as plt
DIMS = 16 # number of dimensions that xn has
WSUM = 5 # number of waves added together to make a splotch
EPSILON = 0.0025 # rate at which xn controlls splotch strength
TRAIN_TIME = 5000 # number of iterations to train for
LEARN_RATE = 0.2 # learning rate
torch.random.manual_seed(1729)
# knlist and k0list are integers, so the splotch functions are periodic
knlist = torch.randint(-2, 3, (DIMS, WSUM, D
...
Seems like figure 1 from Miller et al is a plot of test performance vs. "out of distribution" test performance. One might expect plots of training performance vs. "out of distribution" test performance to have more spread.