Shmi - AI Alignment Forum

Well written. Do you have a few examples of pivoting when it becomes apparent that the daily grind no longer optimizes for solving the problem?

Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide

Shmi2y8-2

I know very little about this area, but I suspect that a writeup like this classic explanation of Godel Incompleteness might be a step in the right direction: Godel incompleteness.

Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Shmi2y10

I meant this:

Shard Question: How does the human brain ensure alignment with its values, and how can we use that information to ensure the alignment of an AI with its designers' values?

which does indeed beg the question in the standard meaning of it.

My point is that there is very much no alignment between different values! They are independent at best and contradictory in many cases. There is an illusion of coherent values that is a rationalization. The difference in values sometimes leads to catastrophic Fantasia-like outcomes on the margins (e.g. people with addiction don't want to be on drugs but are), but most of the time it results in a mild akrasia (I am writing this instead of doing something that makes me money). This seems like a good analogy: http://max.mmlc.northwestern.edu/mdenner/Demo/texts/swan_pike_crawfish.htm

Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Shmi2y10

That seems like a useful decomposition! Point 2 seems to beg the question, why does it assume that the brain can "ensure alignment with its values", as opposed to, say, synthesizes an illusion of values by aggregating data from various shards?

How I Formed My Own Views About AI Safety

Shmi3y3-2

Just a small remark

Open a blank google doc, set a one hour timer, and start writing out your case for why AI Safety is the most important problem to work on

Not "why", but "whether" is the first step. Otherwise you end up being a clever arguer.

Why 1-boxing doesn't imply backwards causation

Shmi4y10

I'm confused... What you call the "Pure Reality" view seems to work just fine, no? (I think you had a different name for it, pure counterfactuals or something.) What do you need counterfactuals/Augmented Reality for? Presumably making decisions thanks to "having a choice" in this framework, right? In the pure reality framework the "student and the test" example one would dispassionately calculate what kind of a student algorithm passes the test, without talking about making a decision to study or not to study. Same with the Newcomb's, of course, one just looks at what kind of agents end up with a given payoff. So... why pick an AR view over the PR view, what's the benefit?

An Orthodox Case Against Utility Functions

Shmi5y00

First, I really like this shift in thinking, partly because it moves the needle toward an anti-realist position, where you don't even need to postulate an external world (you probably don't see it that way, despite saying "Everything is a subjective preference evaluation").

Second, I wonder if you need an even stronger restriction, not just computable, but efficiently computable, given that it's the agent that is doing the computation, not some theoretical AIXI. This would probably also change "too easily" in "those expectations aren't (too easily) exploitable to Dutch-book." to efficiently. Maybe it should be even more restrictive to avoid diminishing returns trying to squeeze every last bit of utility by spending a lot of compute.

What is the subjective experience of free will for agents?

Shmi5y10

Feel free to let me know either way, even if you find that the posts seem totally wrong or missing the point.

What is the subjective experience of free will for agents?

Answer by ShmiApr 02, 2020*30

My answer is a rather standard compatibilist one, the algorithm in your brain produces the sensation of free will as an artifact of an optimization process.

There is nothing you can do about it (you are executing an algorithm, after all), but your subjective perception of free will may change as you interact with other algorithms, like me or Jessica or whoever. There aren't really any objective intentional "decisions", only our perception of them. Therefore there the decision theories are just byproducts of all these algorithms executing. It doesn't matter though, because you have no choice but to feel that decision theories are important.

So, watch the world unfold before your eyes, and enjoy the illusion of making decisions.

I wrote about this over the last few years:

https://www.lesswrong.com/posts/NptifNqFw4wT4MuY8/agency-is-bugs-and-uncertainty

https://www.lesswrong.com/posts/TQvSZ4n4BuntC22Af/decisions-are-not-about-changing-the-world-they-are-about

https://www.lesswrong.com/posts/436REfuffDacQRbzq/logical-counterfactuals-are-low-res

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments