AI ALIGNMENT FORUM
AF

All of Vanessa Kosoy's Comments + Replies

Thanks for this!

What I was saying up there is not a justification of Hurwicz' decision rule. Rather, it is that if you already accept the Hurwicz rule, it can be reduced to maximin, and for a simplicity prior the reduction is "cheap" (produces another simplicity prior).

Why accept the Hurwicz' decision rule? Well, at least you can't be accused of a pessimism bias there. But if you truly want to dig deeper, we can start instead from an agent making decisions according to an ambidistribution, which is a fairly general (assumption-light) way of making decision... (read more)

Compositional language for hypotheses about computations

Vanessa Kosoy12d20

You now understand correctly. The reason I switch to colored operads is to add even more generality. My key use case is when the operad consists of terms-with-holes in a programming language, in which case the colors are the types of the terms/holes.

Vanessa Kosoy's Shortform

Vanessa Kosoy12d50

The following are my thoughts on the definition of learning in infra-Bayesian physicalism (IBP), which is also a candidate for the ultimate prescriptive agent desideratum.

In general, learning of hypotheses about the physical universe is not possible because of traps. On the other hand, learning of hypotheses about computable mathematics is possible in the limit of ample computing resources, as long as we can ignore side effects of computations. Moreover, learning computable mathematics implies approximating Bayesian planning w.r.t the prior about the physi... (read more)

Compositional language for hypotheses about computations

Vanessa Kosoy12d20

No? The elements of an operad have fixed arity. When defining a free operad you need to specify the arity of every generator.

1Chris van Merwijk12d

I may be confused somehow. Feel free to ignore. But: * At first I thought you meant the input alphabet to be the colors, not the operations. * Instead, am I correct that "the free operad generated by the input alphabet of the tree automaton" is an operad with just one color, and the "operations" are basically all the labeled trees where labels of the nodes are the elements of the alphabet, such that the number of children of a node is always equal to the arity of that label in the input alphabet? * That would make sense, as the algebra would then I guess assign the state space of the tree automaton to the single color of the operad, and each arity n operation would be mapped to the mathematical function from Q^n to Q. * That would make sense I think, but then why do you talk about a "colored" operad in: "we can now define a deterministic automaton over a (colored) operad O to be an O-algebra"?

Vanessa Kosoy's Shortform

Vanessa Kosoy2mo40

Another excellent catch, kudos. I've really been sloppy with this shortform. I corrected it to say that we can approximate the system arbitrarily well by VNM decision-makers. Although, I think it's also possible to argue that a system that selects a non-exposed point is not quite maximally influential, because it's selecting somethings that's very close to delegating some decision power to chance.

Also, maybe this cannot happen when $D$ is the inverse limit of finite sets? (As is the case in sequential decision making with finite action/observation ... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy2mo20

Example: Let $X = {0, 1}$ , and $D$ consist of the probability intervals $Θ_{0} := [0, \frac{2}{3}]$ , $Θ_{1} := [\frac{1}{3}, 1]$ and $Θ_{2} := [\frac{1}{3}, \frac{2}{3}]$ . Then, it is (I think) consistent with the desideratum to have $Θ^{*} = Θ_{2}$ .

Not only that interpreting $Θ^{*} = Θ_{2}$ requires an unusual decision rule (which I will be calling "utility hyperfunction"), but applying any ordinary utility function to this example yields a non-unique maximum. This is another point in favor of the significance of hyperfunctions.

Vanessa Kosoy's Shortform

Vanessa Kosoy2mo30

You're absolutely right, good job! I fixed the OP.

Vanessa Kosoy's Shortform

Vanessa Kosoy2mo*90

TLDR: Systems which locally maximal influence can be described as VNM decision-makers.

There are at least 3 different motivations leading to the concept of "agent" in the context of AI alignment:

The sort of system we are concerned about (i.e. which poses risk)
The sort of system we want to build (in order to defend from dangerous systems)
The sort of systems that humans are (in order to meaningfully talk about "human preferences")

Motivation #1 naturally suggests a descriptive approach, motivation #2 naturally suggests a prescriptive approach, and motivation #... (read more)

harfe2mo92

I think there are some subtleties with the (non-infra) bayesian VNM version, which come down to the difference between "extreme point" and "exposed point" of $D$ . If a point is an extreme point that is not an exposed point, then it cannot be the unique expected utility maximizer under a utility function (but it can be a non-unique maximizer).

For extreme points it might still work with uniqueness, if, instead of a VNM-decision-maker, we require a slightly weaker decision maker whose preferences satisfy the VNM axioms except continuity.

2Vanessa Kosoy2mo

Not only that interpreting Θ∗=Θ2 requires an unusual decision rule (which I will be calling "utility hyperfunction"), but applying any ordinary utility function to this example yields a non-unique maximum. This is another point in favor of the significance of hyperfunctions.

harfe2mo82

For any $Φ, Ψ \in^D$ , if $Θ^{*} = Φ \lor Ψ$ then either $Φ \subseteq Ψ$ or $Ψ \subseteq Φ$ .

I think this condition might be too weak and the conjecture is not true under this definition.

If $Φ_{1} \subseteq Φ_{2}$ , then we have $E_{y \sim ξ} {min}_{μ \in Φ_{2}} E_{x \sim μ} u (x, y) \leq E_{y \sim ξ} {min}_{μ \in Φ_{1}} E_{x \sim μ} u (x, y)$ (because a minimum over a larger set is smaller). Thus, $Φ_{2}$ can only be the unique argmax if $Φ_{1} = Φ_{2}$ .

Consider the example $^D={[0,x]:x∈[0,1]}$ . Then $^D$ is closed. And $Θ^{*} = [0, 1]$ satisfies $Θ^{*} = Φ \lor Ψ ⟹ Φ \subseteq Ψ \lor Ψ \subseteq Φ$ . But per the above it cannot be a unique maximizer.

Maybe the issue can be fixed if we strengthen the condition so that $Φ^{*}$ has to be also minimal with res... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy2mo70

Master post for selection/coherence theorems. Previous relevant shortforms: learnability constraints decision rules, AIT selection for learning.

Vanessa Kosoy2mo*90

TLDR: Systems which locally maximal influence can be described as VNM decision-makers.

There are at least 3 different motivations leading to the concept of "agent" in the context of AI alignment:

The sort of system we are concerned about (i.e. which poses risk)
The sort of system we want to build (in order to defend from dangerous systems)
The sort of systems that humans are (in order to meaningfully talk about "human preferences")

Motivation #1 naturally suggests a descriptive approach, motivation #2 naturally suggests a prescriptive approach, and motivation #... (read more)

What is the most impressive game LLMs can play well?

Vanessa Kosoy3mo20

Do you mean that seeing the opponent make dumb moves makes the AI infer that its own moves are also supposed to be dumb, or something else?

2gwern2mo

Yes.

What is the most impressive game LLMs can play well?

Vanessa Kosoy3mo21

Relevant link

What is the most impressive game LLMs can play well?

Vanessa Kosoy3mo30

Apparently someone let LLMs play against the random policy and for most of them, most games end in a draw. Seems like o1-preview is the best of those tested, managing to win 47% of the time.

4gwern3mo

Given the other reports, like OA's own benchmarking (as well as the extremely large dataset of chess games they mention training on), I am skeptical of this claim, and wonder if this has the same issue as other 'random chess game' tests, where the 'random' part is not neutral but screws up the implied persona.

What is the most impressive game LLMs can play well?

Vanessa Kosoy3mo30

Relevant: Manifold market about LLM chess

1Cole Wyeth3mo

Interesting, the prices seemed reasonable overall though I traded the later dates down a little bit because if LLMs haven't won be 2030 the paradigm is probably limited (IMO they hadn't priced in that update). I suppose that it's a slightly "unfair" comparison because chess engines are very narrow and humans can't beat them either. How do LLMs compare to top human chess players?

Are there cognitive realms?

Vanessa Kosoy3mo30Review for 2023 Review

This post states and speculates on an important question: are there different mind types that are in some sense "fully general" (the author calls it "unbounded") but are nevertheless qualitatively different. The author calls these hypothetical mind taxa "cognitive realms".

This is how I think about this question, from within the LTA:

To operationalize "minds" we should be thinking of learning algorithms. Learning algorithms can be classified according to their "syntax" and "semantics" (my own terminology). Here, semantics refers to questions such as (i) what... (read more)

SolidGoldMagikarp (plus, prompt generation)

Vanessa Kosoy3mo22Review for 2023 Review

This post describes an intriguing empirical phenomenon in particular language models, discovered by the authors. Although AFAIK it was mostly or entirely removed in contemporary versions, there is still an interesting lesson there.

While non-obvious when discovered, we now understand the mechanism. The tokenizer created some tokens which were very rare or absent in the training data. As a result, the trained model mapped those tokens to more or less random features. When a string corresponding to such a token is inserted into the prompt, the resulting reply... (read more)

Learning-theoretic agenda reading list

Vanessa Kosoy3mo40Review for 2023 Review

This is just a self-study list for people who want to understand and/or contribute to the learning-theoretic AI alignment research agenda. I'm not sure why people thought it deserves to be in the Review. FWIW, I keep using it with my MATS scholars, and I keep it more or less up-to-date. A complementary resource that became available more recently is the video lectures.

Shell games

Vanessa Kosoy3mo40Review for 2023 Review

This post suggests an analogy between (some) AI alignment proposals and shell games or perpetuum mobile proposals. Pertuum mobiles are an example how an idea might look sensible to someone with a half-baked understanding of the domain, while remaining very far from anything workable. A clever arguer can (intentionally or not!) hide the error in the design wherever the audience is not looking at any given moment. Similarly, some alignment proposals might seem correct when zooming in on every piece separately, but that's because the error is always hidden aw... (read more)

Why Not Just Outsource Alignment Research To An AI?

Vanessa Kosoy3mo125Review for 2023 Review

This post argues against alignment protocols based on outsourcing alignment research to AI. It makes some good points, but also feels insufficiently charitable to the proposals it's criticizing.

John make his case by an analogy to human experts. If you're hiring an expert in domain X, but you understand little in domain X yourself then you're going to have 3 serious problems:

Illusion of transparency: the expert might say things that you misinterpret due to your own lack of understanding.
The expert might be dumb or malicious, but you will believe them due to

... (read more)

Current AIs Provide Nearly No Data Relevant to AGI Alignment

Vanessa Kosoy3mo104Review for 2023 Review

This post makes an important point: the words "artificial intelligence" don't necessarily carve reality at the joints, the fact something is true about a modern system that we call AI doesn't automatically imply anything about arbitrary future AI systems, no more than conclusions about e.g. Dendral or DeepBlue carry over to Gemini.

That said, IMO the author somewhat overstates their thesis. Specifically, I take issue with all the following claims:

LLMs have no chance of becoming AGI.
LLMs are automatically safe.
There is nearly no empirical evidence from LLMs

... (read more)

When is Goodhart catastrophic?

Vanessa Kosoy3mo52Review for 2023 Review

This post provides a mathematical analysis of a toy model of Goodhart's Law. Namely, it assumes that the optimization proxy $U$ is a sum of the true utility function $V$ and noise $X$ , such that:

$V$ and $X$ are independent random variables w.r.t. some implicit distribution $ζ$ on the solution space. The meaning of this distribution is not discussed, but I guess we might think of it some kind of inductive bias, e.g. a simplicity prior.
The optimization process can be modeled as conditioning $ζ$ on a high value of

Vanessa Kosoy3mo82Review for 2023 Review

This post attempts to describe a key disagreement between Karnofsky and Soares (written by Karnofsky) pertaining to the alignment protocol "train an AI to simulate an AI alignment researcher". The topic is quite important, since this is a fairly popular approach.

Here is how I view this question:

The first unknown is how accurate is the simulation. This is not really discussed in the OP. On the one hand, one might imagine that with more data, compute and other improvements, the AI should ultimately converge on an almost perfect simulation of an AI alignment ... (read more)

Neural networks generalize because of this one weird trick

Vanessa Kosoy3mo82Review for 2023 Review

This post is a solid introduction to the application of Singular Learning Theory to generalization in deep learning. This is a topic that I believe to be quite important.

One nitpick: The OP says that it "seems unimportant" that ReLU networks are not analytic. I'm not so sure. On the one hand, yes, we can apply SLT to (say) GELU networks instead. But GELUs seem mathematically more complicated, which probably translates to extra difficulties in computing the RLCT and hence makes applying SLT harder. Alternatively, we can consider a series of analytical respo... (read more)

Natural Abstractions: Key claims, Theorems, and Critiques

Vanessa Kosoy3mo70Review for 2023 Review

This post is a great review of the Natural Abstractions research agenda, covering both its strengths and weaknesses. It provides a useful breakdown of the key claims, the mathematical results and the applications to alignment. There's also reasonable criticism.

To the weaknesses mentioned in the overview, I would also add that the agenda needs more engagement with learning theory. Since the claim is that all minds learn the same abstractions, it seems necessary to look into the process of learning, and see what kind of abstractions can or cannot be learned ... (read more)

Towards Developmental Interpretability

Vanessa Kosoy3mo1417Review for 2023 Review

This post introduces Timaeus' "Developmental Interpretability" research agenda. The latter is IMO one of the most interesting extant AI alignment research agendas.

The reason DevInterp is interesting is that it is one of the few AI alignment research agendas that is trying to understand deep learning "head on", while wielding a powerful mathematical tool that seems potentially suitable for the purpose (namely, Singular Learning Theory). Relatedly, it is one of the few agendas that maintains a strong balance of theoretical and empirical research. As such, it... (read more)

Acausal normalcy

Vanessa Kosoy3mo50Review for 2023 Review

This post is a collection of claims about acausal trade, some of which I find more compelling and some less. Overall, I think it's a good contribution to the discussion.

Claims that I mostly agree with include:

Acausal trade in practice is usually not accomplished by literal simulation (the latter is mostly important as a convenient toy model) but by abstract reasoning.
It is likely to be useful to think of the "acausal economy" as a whole, rather just about each individual trade separately.

Claims that I have some quibbles with include:

The claim that there is

Vanessa Kosoy3mo82Review for 2023 Review

This post argues that, while it's traditional to call policies trained by RL "agents", there is no good reason for it and the terminology does more harm than good. IMO Turner has a valid point, but he takes it too far.

What is an "agent"? Unfortunately, this question is not discussed in the OP in any detail. There are two closely related informal approaches to defining "agents" that I like, one more axiomatic / black-boxy and the other more algorithmic / white-boxy.

The algorithmic definition is: An agent is a system that can (i) learn models of its environm... (read more)

FixDT

Vanessa Kosoy4mo60Review for 2023 Review

This post proposes an approach to decision theory in which we notion of "actions" is emergent. Instead of having an ontologically fundamental notion of actions, the agent just has beliefs, and some of them are self-fulfilling prophecies. For example, the agent can discover that "whenever I believe my arm will move up/down, my arm truly moves up/down", and then exploit this fact by moving the arm in the right direction to maximize utility. This works by having a "metabelief" (a mapping from beliefs to beliefs; my terminology, not the OP's) and allowing the ... (read more)

There are no coherence theorems

Vanessa Kosoy4mo70

I feel that coherence arguments, broadly construed, are a reason to be skeptical of such proposals, but debating coherence arguments because of this seems backward. Instead, we should just be discussing your proposal directly. Since I haven't read your proposal yet, I don't have an opinion, but some coherence-inspired question I would be asking are:

Can you define an incomplete-preferences AIXI consistent with this proposal?
Is there an incomplete-preferences version of RL regret bound theory consistent with this proposal?
What happens when your agent is constructing a new agent? Does the new agent inherit the same incomplete preferences?

There are no coherence theorems

Vanessa Kosoy4mo1121Review for 2023 Review

This post tries to push back against the role of expected utility theory in AI safety by arguing against various ways to derive expected utility axiomatically. I heard many such arguments before, and IMO they are never especially useful. This post is no exception.

The OP presents the position it argues against as follows (in my paraphrasing): "Sufficiently advanced agents don't play dominated strategies, therefore, because of [theorem], they have to be expected utility maximizers, therefore they have to be goal-directed and [other conclusions]". They then p... (read more)

Elliott Thornley4mo72

Thanks. I agree with your first four bulletpoints. I disagree that the post is quibbling. Weak man or not, the-coherence-argument-as-I-stated-it was prominent on LW for a long time. And figuring out the truth here matters. If the coherence argument doesn't work, we can (try to) use incomplete preferences to keep agents shutdownable. As I write elsewhere:

The List of Lethalities mention of ‘Corrigibility is anti-natural to consequentialist reasoning’ points to Corrigibility (2015) and notes that MIRI failed to find a formula for a shutdownable agent. MIRI fa

Vanessa Kosoy4mo*45Review for 2023 Review

This remains the best overview of the learning-theoretic agenda to-date. As a complementary pedagogic resource, there is now also a series of video lectures.

Since the article was written, there were several new publications:

Gergely Szűcs's article on interpreting quantum mechanics using infra-Bayesian physicalism.
My paper on linear infra-Bayesian bandits.
An article on infra-Bayesian haggling by my MATS scholar Hanna Gabor. This approach to multi-agent systems did not exist when the overview was written, and currently seems like the most promising direction

... (read more)

Some Rules for an Algebra of Bayes Nets

Vanessa Kosoy4mo40

Seems right, but is there a categorical derivation of the Wentworth-Lorell rules? Maybe they can be represented as theorems of the form: given an arbitrary Markov category C, such-and-such identities between string diagrams in C imply (more) identities between string diagrams in C.

Connectomics seems great from an AI x-risk perspective

Vanessa Kosoy4mo62Review for 2023 Review

This article studies a potentially very important question: is improving connectomics technology net harmful or net beneficial from the perspective of existential risk from AI? The author argues that it is net beneficial. Connectomics seems like it would help with understanding the brain's reward/motivation system, but not so much with understanding the brain's learning algorithms. Hence it arguably helps more with AI alignment than AI capability. Moreover, it might also lead to accelerating whole brain emulation (WBE) which is also helpful.

The author ment... (read more)

Some Rules for an Algebra of Bayes Nets

Vanessa Kosoy4mo50Review for 2023 Review

This article studies a natural and interesting mathematical question: which algebraic relations hold between Bayes nets? In other words, if a collection of random variables is consistent with several Bayes nets, what other Bayes nets does it also have to be consistent with? The question is studied both for exact consistency and for approximate consistency: in the latter case, the joint distribution is KL-close to a distribution that's consistent with the net. The article proves several rules of this type, some of them quite non-obvious. The rules have conc... (read more)

5Alexander Gietelink Oldenziel4mo

I've been told a Bayes net is "just" a functor from a free Cartesian category to a category of probability spaces /Markov Kernels.

The 2023 LessWrong Review: The Basic Ask

Vanessa Kosoy4mo20

Tbf, you can fit a quadratic polynomial to any 3 points. But triangular numbers are certainly an aesthetically pleasing choice. (Maybe call it "triangular voting"?)

Complete Feedback

Vanessa Kosoy5mo52

I feel that this post would benefit from having the math spelled out. How is inserting a trader a way to do feedback? Can you phrase classical RL like this?

2Abram Demski5mo

Yeah, I totally agree. This was initially a quick private message to someone, but I thought it was better to post it publicly despite the inadequate explanations. I think the idea deserves a better write-up.

Vanessa Kosoy's Shortform

Vanessa Kosoy5mo20

Two thoughts about the role of quining in IBP:

Quine's are non-unique (there can be multiple fixed points). This means that, viewed as a prescriptive theory, IBP produces multi-valued prescriptions. It might be the case that this multi-valuedness can resolve problems with UDT such as Wei Dai's 3-player Prisoner's Dilemma and the anti-Newcomb problem^[1]. In these cases, a particular UDT/IBP (corresponding to a particular quine) loses to CDT. But, a different UDT/IBP (corresponding to a different quine) might do as well as CDT.
What to do about agents that don

... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy6mo*32

Ambidistributions

I believe that all or most of the claims here are true, but I haven't written all the proofs in detail, so take it with a grain of salt.

Ambidistributions are a mathematical object that simultaneously generalizes infradistributions and ultradistributions. It is useful to represent how much power an agent has over a particular system: which degrees of freedom it can control, which degrees of freedom obey a known probability distribution and which are completely unpredictable.

Definition 1: Let $X$ be a compact Polish space. A (crisp) ... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy7mo192

Here's the sketch of an AIT toy model theorem that in complex environments without traps, applying selection pressure reliably produces learning agents. I view it as an example of Wentworth's "selection theorem" concept.

Consider any environment $μ$ of infinite Kolmogorov complexity (i.e. uncomputable). Fix a computable reward function

r : (A \times O)^{*} \to [0, 1]

Suppose that there exists a policy $π^{*}$ of finite Kolmogorov complexity (i.e. computable) that's optimal for $μ$ in the slow discount limit. That is,

lim γ \to 1 (1 - γ) (max π E_{μ π} [\infty \sum n = 0 γ^{n} r_{n}] - E_{μ π^{*}} [\infty \sum n

... (read more)

AI forecasting bots incoming

Vanessa Kosoy7mo31

Can you explain what's your definition of "accuracy"? (the 87.7% figure)
Does it correspond to some proper scoring rule?

AI forecasting bots incoming

Vanessa Kosoy7mo121

Vanessa Kosoy's Shortform

Vanessa Kosoy8mo50

I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs).

While some AI techniques developed for proof assistants might be transferable to other problems, I can easily imagine a responsible actor^[1] producing a net positive. Don't disclose your techniques (except maybe ver... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy8mo*126

The recent success of AlphaProof updates me in the direction of "working on AI proof assistants is a good way to reduce AI risk". If these assistants become good enough, it will supercharge agent foundations research^[1] and might make the difference between success and failure. It's especially appealing that it leverages AI capability advancement for the purpose of AI alignment in a relatively^[2] safe way, thereby the deeper we go into the danger zone the greater the positive impact^[3].

EDIT: To be clear, I'm not saying that working on proof assis... (read more)

Leon Lang8mo915

I think the main way that proof assistant research feeds into capabilies research is not through the assistants themselves, but by the transfer of the proof assistant research to creating foundation models with better reasoning capabilities. I think researching better proof assistants can shorten timelines.

See also Demis' Hassabis recent tweet. Admittedly, it's unclear whether he refers to AlphaProof itself being accessible from Gemini, or the research into AlphaProof feeding into improvements of Gemini.
See also an important paragraph in the blogpost for A

... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy10mo50

Here is a modification of the IBP framework which removes the monotonicity principle, and seems to be more natural in other ways as well.

First, let our notion of "hypothesis" be $Θ \in □^{c} (Γ \times 2^{Γ})$ . The previous framework can be interpreted in terms of hypotheses of this form satisfying the condition

{p r}_{Γ \times 2^{Γ}} B r (Θ) = Θ

(See Proposition 2.8 in the original article.) In the new framework, we replace it by the weaker condition

B r (Θ) \supseteq ({i d}_{Γ} \times {d i a g}_{2^{Γ}})_{*} Θ

This can be roughly interpreted as requiring that (i) whenever the output of a program P determines whether some other program... (read more)

Linear infra-Bayesian Bandits

Vanessa Kosoy11mo30

Sorry, that footnote is just flat wrong, the order actually doesn't matter here. Good catch!

There is a related thing which might work, namely taking the downwards closure of the affine subspace w.r.t. some cone which is somewhat larger than the cone of measures. For example, if your underlying space has a metric, you might consider the cone of signed measures which have non-negative integral with all positive functions whose logarithm is 1-Lipschitz.

Vanessa Kosoy's Shortform

Vanessa Kosoy1y20

Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to "plain" IBRL regret bounds when we consider the core and the envelope as the "inside" of the agent.

Assume that the action and observation sets factor as $A = A_{0} \times A_{1}$ and $O = O_{0} \times O_{1}$ , where $(A_{0}, O_{0})$ is the interface with the external environment and $(A_{1}, O_{1})$ is the interface with the envelope.

Let $Λ : Π \to □ (Γ \times (A \times O)^{ω})$ be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:

Marginalizing over $Γ$ . That is, le

... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy1y50

Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.

In the following, all infradistributions are crisp.

Fix finite action set $A$ and finite observation set $O$ . For any $k \in N$ and $γ \in (0, 1)$ , let

M_{γ}^{k} : (A \times O)^{ω} \to Δ (A \times O)^{k}

be defined by

M_{γ}^{k} (h | d) := (1 - γ) \infty \sum n = 0 γ^{n} [[h = d_{n : n + k}]]

In other words, this kernel samples a time step $n$ out of the geometric distribution with parameter... (read more)

Vanessa Kosoy's Shortform

Vanessa Kosoy1y20

Formalizing the richness of mathematics

Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.

Here is my proposal for how to formulate a theorem that would make this idea rigorous.

(Wrong) First Attempt

Fix some natural... (read more)

Where I agree and disagree with Eliezer

Vanessa Kosoy1y60Review for 2022 Review

I wrote a review here. There, I identify the main generators of Christiano's disagreement with Yudkowsky^[1] and add some critical commentary. I also frame it in terms of a broader debate in the AI alignment community.

^{^}
I divide those into "takeoff speeds", "attitude towards prosaic alignment" and "the metadebate" (the last one is about what kind of debate norms should we have about this or what kind of arguments should we listen to.)

The Learning-Theoretic Agenda: Status 2023

Vanessa Kosoy1y30

Yes, this is an important point, of which I am well aware. This is why I expect unbounded-ADAM to only be a toy model. A more realistic ADAM would use a complexity measure that takes computational complexity into account instead of $K$ . For example, you can look at the measure $C$ I defined here. More realistically, this measure should be based on the frugal universal prior.

Critical review of Christiano's disagreements with Yudkowsky

Vanessa Kosoy1y1211

Thank you for the clarification.

How do you expect augmented humanity will solve the problem? Will it be something other than "guessing it with some safe weak lesser tries / clever theory"?

Eliezer Yudkowsky1y615

They can solve it however they like, once they're past the point of expecting things to work that sometimes don't work. I have guesses but any group that still needs my hints should wait and augment harder.