We realized that if we consider an empty board an optimizing system then any finite pattern is an optimizing system (because it's similarly robust to adding non-viable collections of live cells)
Ah. I interpreted the statement about the empty board as being one of:
A small random perturbation, will probably be non-viable/collapse back to the empty board. (Whereas patterns that are viable don't (necessarily) have this property.)
I then, asked about whether the bottle cap example, had the same robustness.
An empty board is also an example of an optimizing system that is robust to adding non-viable collections of live cells (e.g., fewer than 3 live cells next to each other).
And the 'bottle cap' example is not (robust to adding cells, or cells colliding* with it)? But if it was, then it would be an 'optimizing system'?
*spreading out, and interacting with it
(Weird meta-note: Are you aware of something unusual about how this comment is posted? I saw a notification for it, but I didn't see it in the comments section for the post itself until initially submitting this reply. I'm newish to posting on Lightcone forums...)
Ah. When you say lightcone forums, what site are you on? What does the URL look like?
For this point, I'm not sure how it fits into the argument. Could you say more?
It's probably a tangent. The idea was:
1) Criticism is great.
2) Explaining how that could be improved is marginally better. (I then exp...
The paper makes a slightly odd multi-step argument to try to connect to active debates in the field:
This comment is some quick feedback on those:
Weirdly, this even happens in papers that themselves
toshow positive results involving NNs.
citations to failures in old systems that we've since improved upon significantly.
Might not be a main point, but this could be padded out with an explanation of how something like that could be marginally better. Like adding:
"As opposed to explaining how that is relevant today, like:
[Old technique] had [problem]. As [...
It's remarkable that googling "thermodynamics of the game of life" turns up zero results.
It's not obvious that thermodynamics generalizes to the game of life, or what the equivalents of energy or order would be: at first glance it has perpetual motion machines ("gliders").
This was a good post. I'd bookmark it, but unfortunately that functionality doesn't exist yet.* (Though if you have any open source bookmark plugins to recommend, that'd be helpful.) I'm mostly responding to say this though:
Designing Recommender Systems to Depolarize
While it wasn't otherwise mentioned in the abstract of the paper (above), this was stated once:
This paper examines algorithmic depolarization interventions with the goal of conflict transformation: not suppressing or eliminating conflict but moving towards more constructive conflict.
I though th...
How do you try to discourage all "deliberate mistakes"?
1. Make something that has a goal. Does AlphaGo make deliberate mistakes at Go? Or does it try to win, and always make the best move* (with possible the limitation that, it might not be as good at playing from positions it wouldn't play itself into)?
*This may be different from 'maximize score, or wins long term'. If you try to avoid teaching your opponent how to play better, while seeking out wins, there can be a 'try to meta game' approach - though this might r...
The most useful definition of "mesa-optimizer" doesn't require them to perform explicit search, contrary to the current standard.
And presumably, the extent to which search takes place isn't important, a measure of risk, or optimizing. (In other words, it's not a part of the definition, and it shouldn't be a part of the definition.)
Some of the reasons we expect mesa-search also apply to mesa-control more broadly.
expect mesa-search might be a problem?
Highly knowledge-based strategies, such as calculus, which find solutions "...
If you would be interested in participating conditional on us offering pay or prizes, that's also useful to know.
Do you want this feedback at the same address?
The authors prove that EPIC is a pseudometric, that is, it behaves like a distance function, except that it is possible for EPIC(R1, R2) to be zero even if R1 and R2 are different. This is desirable, since if R1 and R2 differ by a potential shaping function, then their optimal policies are guaranteed to be the same regardless of transition dynamics, and so we should report the “distance” between them to be zero.
If EPIC(R1, R2) is thought of as two functions f(g(R1), g(R2)), where g returns the optimal policy of its input, and f is a distance ...
the exact same answer it would have output without the perturbation.
It always gives the same answer for the last digit?
(The object which is not the object:)
So you just don't do it, even though it feels like a good idea.
More likely people don't do it because they can't, or a similar reason. (The point of saying "My life would be better if I was in charge of the world" is not to serve as a hypothesis, to be falsified.)
(The object:)
Beliefs intervene on action. (Not success, but choice.)
We are biased and corrupted. By taking the outside view on how our own algorithm performs in a given situation, we can adjust accordingly.
The piece seems biased towards...
What term do people use for the definition of alignment in which A is trying to achieve H's goals
Sounds like it should be called goal alignment, whatever it's name happens to be.
The thing about Montezuma's revenge and similar hard exploration tasks is that there's only one trajectory you need to learn; and if you forget any part of it you fail drastically; I would by default expect this to be better than adversarial dynamics / populations at ensuring that the agent doesn't forget things.
But is it easier to remember things if there's more than one way to do them?
Bumping into the human makes them disappear, reducing the agent's control over what the future looks like. This is penalized.
Decreases or increases?
AUPstarting state fails here,
but AUPstepwise does not.
Questions:
1. Is "Model-free AUP" the same as "AUP stepwise"?
2. Why does "Model-free AUP" wait for the pallet to reach the human before moving, while the "Vanilla" agent does not?
There is one weird thing that's been pointed out, where stepwise inaction while driving a car leads to not-crashing being penalized...
CCC says (for non-evil goals) "if the optimal policy is catastrophic, then it's because of power-seeking". So its contrapositive is indeed as stated.
That makes sense. One of the things I like about this approach is that it isn't immediately clear what else could be a problem, and that might just be implementation details or parameters: corrigibility from limited power only works if we make sure that power is low enough we can turn it off, if the agent will acquire power if that's the only way to achieve its goal rather than stoppin...
I liked this post, and look forward to the next one.
More specific, and critical commentary (It seems it is easier to notice surprise than agreement):
(With embedded footnotes)
1.
If the CCC is right, then if power gain is disincentivised, the agent isn't incentivised to overfit and disrupt our AU landscape.
(The CCC didn't make reference to overfitting.)
Premise:
If A is true then B will be true.
Conclusion:
If A is false B will be false.
The conclusion doesn't follow from the premise.
2.
Without even knowing who we are or what we want, the agent'...
we can think of Bayes' Law as myopically optimizing per-hypothesis, uncaring of overall harm to predictive accuracy.
Or just bad implementations do this - predict-o-matic as described sounds like a bad idea, and like it doesn't contain hypotheses, so much as "players"*. (And the reason there'd be a "side channel" is to understand theories - the point of which is transparency, which, if accomplished, would likely prevent manipulation.)
We can imagine different parts of the network fighting for control, much like the Bayesia...
Going to the green state means you can't get to the purple state as quickly.
On a deep level, why is the world structured such that this happens? Could you imagine a world without opportunity cost of any kind?
In a complete graph, all nodes are directly connected.
Equivalently, we assumed the agent isn't infinitely farsighted (γ<1); if it were, it would be possible to be in "more than one place at the same time", in a sense (thanks to Rohin Shah for this interpretation).
The opposite of this, is that if it were possible for an agen...
So: is it possible to formulate an instrumental version of Occam? Can we justify a simplicity bias in our policies?
Justification has the downside of being wrong, a) if what you are arguing is wrong/false, b) can be wrong even if what you are arguing is right/true. That being said/Epistemic warnings concluded...
1. A more complicated policy:
2. We don't have the right policy.
However, this paper wants the answers to actually be correct. Thus, they claim that for sufficiently complicated questions, since the debate can't reach the right answer, the debate isn't truth-seeking -- but in these cases, the answer is still in expectation more accurate than the answer the judge would come up with by themselves.
Truth-seeking: better than the answer the judge would have come up with by themself (how does this work? making an observation at random instead of choosing the observation that's recommended by the debate?)
Truth-finding: the truth is found.
Neither classical adversarial training nor training on a version of ImageNet designed to reduce the reliance on texture helps a lot, but modifying the network architecture can increase the accuracy on ImageNet-A from around 5% to 15%.
(Section link.)
Wow, 15% sounds really low. How well do people perform on said dataset?
This reminds me of:
David Ha's
most recent paper, Weight Agnostic Neural Networks looks at what happens when you do a...
I feel like this same set of problems gets re-solved a lot. I'm worried that it's a sign of ill health for the field.
Maybe the problem is getting everyone on the same page.
This isn't quite embedded agency, but it requires the base optimizer to be "larger" than the mesa-optimizer, only allowing mesa-suboptimizers, which is unlikely to be guaranteed in general.
Size might be easier to handle if some parts of the design are shared. For example, if the mesa-optimizer's design was the same as the agent, and the agent understood itself, and knew the mesa-optimizer's design, then it seems like them being the same size wouldn't be (as much of) an issue.
Principal optimization failures occur either if the...
And wow, this turned out longer than I thought it would. It's in 6 sections:
1. Starting with models versus learning models.
2. Is the third conditions for deceptive alignment necessary?
3. An alternative to, or form of, treacherous turn: Building a successor.
4. Time management: How deceptive alignment might be not be a lot more computationally expensive, and why treacherous turns might have a time delay.
5. The model of a distributional shift, and it's relationship to the model of training followed by deployment.
6. Miscellaneous
1.
The mesa-optimizer...
How could a (relatively) 'too-strong' epistemic subsystem be a bad thing?
So if we view an epistemic subsystem as an super intelligent agent who has control over the map and has the goal of make the map match the territory, one extreme failure mode is that it takes a hit to short term accuracy by slightly modifying the map in such a way as to trick the things looking at the map into giving the epistemic subsystem more control. Then, once it has more control, it can use it to manipulate the territory to make the territory more predictable. If your goal is to minimize surprise, you should destroy all the surprising things.
Note th...
I think we get enough things referencing quantum mechanics that we should probably explain why that doesn't work (if I it doesn't) rather than just downvoting and moving on.