AI ALIGNMENT FORUM
AF

All of Alexander Gietelink Oldenziel's Comments + Replies

What about the latent adversarial training papers?

What about the Mechanistically Elicitating Latent Behaviours?

I've been told a Bayes net is "just" a functor from a free Cartesian category to a category of probability spaces /Markov Kernels.

4Vanessa Kosoy5mo

Seems right, but is there a categorical derivation of the Wentworth-Lorell rules? Maybe they can be represented as theorems of the form: given an arbitrary Markov category C, such-and-such identities between string diagrams in C imply (more) identities between string diagrams in C.

EIS V: Blind Spots In AI Safety Interpretability Research

Alexander Gietelink Oldenziel2y10

I was intrigued by your claim that FFS is already subsumed by work on academia. I clicked the link you provided but from a quick skim it doesn't seem to do FFS or anything beyond the usual pearl causality story as far as I can tell. Maybe I am missing something - could you provide an specific page where you think FFS is being subsumed?

0David Reber2y

Also, just to make sure we share a common understanding of Schölkopf 2021: Wouldn't you agree that asking "how do we do causality when we don't even know what level abstraction on which to define causal variables?" is beyond the "usual pearl causality story" as usually summarized in FFS posts? It certainly goes beyond Pearl's well-known works.

0David Reber2y

I don't think my claim is that "FFS is already subsumed by work in academia": as I acknowledge, FFS is a different theoretical framework than Pearl-based causality. I view them as two distinct approaches, but my claim is that they are motivated by the same question (that is, how to do causal representation learning). It was intentional that the linked paper is an intro survey paper to the Pearl-ish approach to causal rep. learning: I mean to indicate that there are already lots of academic researchers studying the question "what does it mean to study causality if we don't have pre-defined variables?" It may be that FFS ends up contributing novel insights above and beyond <Pearl-based causal rep. learning>, but a priori I expect this to occur only if FFS researchers are familiar with the existing literature, which I haven't seen mentioned in any FFS posts. My line of thinking is: It's hard to improve on a field you aren't familiar with. If you're ignorant of the work of hundreds of other researchers who are trying to answer the same underlying question you are, odds are against your insights being novel / neglected.

Soft optimization makes the value target bigger

Alexander Gietelink Oldenziel2y30

Great stuff Jeremy!

Two basic comments:

1. Classical Learning Theory is flawed and predicts that neural networks should overfit when they don't.
The correct way to understand this is through the lens of singular learning theory.

2. Quantilizing agents can actually be reflectively stable. There's work by Diffractor (Alex Appel) on this topic that should become public soon.

Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel2y34

Yeah follow-up posts will definitely get into that!

To be clear: (1) the initial posts won't be about Crutchfield work yet - just introducing some background material and overarching philosophy (2) The claim isn't that standard measures of information theory are bad. To the contrary! If anything we hope these posts will be somewhat of an ode to information theory as a tool for interpretability.

Adam wanted to add a lot of academic caveats - I was adamant that we streamline the presentation to make it short and snappy for a general audience but it... (read more)

Self-Embedded Agent's Shortform

Alexander Gietelink Oldenziel3y10

Concept splintering in Imprecise Probability: Aleatoric and Epistemic Uncertainty.

There is a general phenomena in mathematics [and outside maths as well!] where in a certain context/ theory $T_{1}$ we have two equivalent definitions $ϕ_{1}, ϕ_{2}$ of a concept $C$ that become inequivalent when we move to a more general context/theory $T_{2}$ . In our case we are moving from the concept of probability distributions to the concept of an imprecise distribution (i.e. a convex set of probability distributions, which in particular could ... (read more)

Will Capabilities Generalise More?

Alexander Gietelink Oldenziel3y10

The point isn't about goal misalignment but capability generalisation. It is surprising to some degree that just selecting on reproductive fitness through its proxies of being well-fed, social status etc humans have obtained the capability to go to the moon. It points toward a coherent notion & existence of 'general intelligence' as opposed to specific capabilities.

Less Threat-Dependent Bargaining Solutions?? (3/2)

Alexander Gietelink Oldenziel3y20

Compare https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory

Principles of Privacy for Alignment Research

Alexander Gietelink Oldenziel3y*40

Thank you for writing this post; I had been struggling with these considerations a while back. I investigated going full paranoid mode but in the end mostly decided against it.

I agree theoretical insight on agency and intelligence have a real chance of leiding to capability gains. I agree on the government spy threat model as being unlikely. I would like to add however that if say MIRI builds a safe AGI prototype - perhaps based on different principles than systems used by adversaries it might make sense for an (ai-assisted) adversary to trawl through your... (read more)

adamShimi's Shortform

Alexander Gietelink Oldenziel3y10

Just wanted to chime in to say that this feels important and I'm happy you brought it up.

Which AI Safety research agendas are the most promising?

Answer by Alexander Gietelink OldenzielJul 13, 2022-1-4

The ones with actual math.

Self-Embedded Agent's Shortform

Alexander Gietelink Oldenziel3y100

Daniel Kokotaljo and I agreed on the following bet: I paid Daniel $1000 today. Daniel will pay me $1100 inflation adjusted if there is no AGI in 2030.

Ramana Kumar will serve as the arbiter. Under unforeseen events we will renegotiate in good-faith.

As a guideline for 'what counts as AGI' I suggested the following, to which Daniel agreed:

"the Arbiter agrees with the statement "there is convincing evidence that there is an operational Artificial General Intelligence" on 6/7/2030"
Defining an artificial general intelligence is a little hard and has

Alexander Gietelink Oldenziel3y12

I see, thank you

AI-Written Critiques Help Humans Notice Flaws

Alexander Gietelink Oldenziel3y50

Are we supposed to know who Yafa is?

I am not able to ascertain the truth value of the relevant sentences with or without assistance. I am a human if that helps

2Paul Christiano3y

Edited to clarify.

4Mark Xu3y

The humans presumably have access to the documents being summarized.

Where I agree and disagree with Eliezer

Alexander Gietelink Oldenziel3y11

What about selecting for "moderation in all things"? Is that not virtue?

Aristotle invented quantification you heard here first

AGI Ruin: A List of Lethalities

Alexander Gietelink Oldenziel3y1046

Thank you, Evan, for living the Virture of Scholarship. Your work is appreciated.

Reshaping the AI Industry

Alexander Gietelink Oldenziel3y50

My guess is that Evan dislikes the apocalyptic /panicky conversations that people are recently having on Lesswrong

1Robert Kirk3y

That's my guess also, but I'm more asking just in case that's not the case, and he disagrees with (for example) the Pragmatic AI Safety sequence, in which case I'd like to know why.

Distributed Decisions

Alexander Gietelink Oldenziel3y50

If I may be so bold, the answer should be a guarded yes.

A snag is that the correct theory of what John calls 'distributed systems' or 'Time' and what theoretical CS academics generally call 'concurrency' is as of yet not fully constructed. To be sure, there are many quite well-developed theoretical frameworks - e.g. the Pi calculus or the various models of concurrency like Petri nets, transitions systems, event structures etc. They're certainly on my list of 'important things I'd like to understand better'.

Our world, and our sensemaking of it, is fun... (read more)

Challenges with Breaking into MIRI-Style Research

Alexander Gietelink Oldenziel3y60

Unclear. Some things that might be involved

a somewhat anti/non academic vibe
a feeling that they have the smartest people anyway, only hire the elite few that have a proven track record
feeling that it would take too much time and energy to educate people
a lack of organisational energy
.... It would be great if somebody from MIRI could chime in.

I might add that I know a number of people interested in AF who feel somewhat afloat/find it difficult to contribute. Feels a bit like a waste of talent

Challenges with Breaking into MIRI-Style Research

Alexander Gietelink Oldenziel3y70

Agreed. Thank you for writing this post. Some thoughts:

As somebody strongly on the Agent Foundations train it puzzles me that there is so little activity outside MIRI itself. We are being told there are almost limitless financial resources, yet - as you explain clearly - it is very hard for people to engage with the material outside of LW.

At the last EA global there was some sort of AI safety breakout session. There were ~12 tables with different topics. I was dismayed to discover that almost every table was full with people excitingly discussing var... (read more)

2Chris_Leong3y

Agreed. Wow, didn't realise it was that little! Do you know why they weren't interested?

Vanessa Kosoy's Shortform

Alexander Gietelink Oldenziel3y20

Why do bad things happen to good people?

The Promise and Peril of Finite Sets

Alexander Gietelink Oldenziel3y30

Perhaps relevant: Constructively there are a infinitely many different notions of finiteness.

1davidad (David A. Dalrymple)3y

Yes, I highly recommend that second link, and Andrej Bauer's work in general. I'm not sure about the claim "there are infinite many different notions of finiteness"; on that particular page I count 10 notions of "finite set" (although I'm not going to claim the collection of all such notions is a finite set), most of which are rarely useful. In the OP I assume finiteness means Bishop-finiteness, which that page calls the "standard definition". I've also found Kuratowski-finiteness useful, and the categorical generalization to finitely generated and finitely presented objects. By far the most useful generalization of finiteness, in my view, is compactness. From a realizability perspective, a compact space X is one where universal quantification (i.e. testing whether a semidecidable predicate holds throughout all of X) is itself semidecidable. This is discussed in various places by Andrej Bauer (MO answer), Paul Taylor (monograph), and Martín Escardó (slideshow). A typical geometric or intuitive way to establish that a metric space is compact is to see it as isometric to a bounded and closed subset of Rn and use the Heine–Borel theorem (although not all compact metric spaces, and certainly not all compact Hausdorff spaces, can be established as compact in such a way).

Self-Embedded Agent's Shortform

Alexander Gietelink Oldenziel4y10

Failure of convergence to social optimum in high frequency trading with technological speed-up

Possible market failures in high-frequency trading are of course a hot topic recently with various widely published Flash Crashes. There has a loud call for a reign in of high frequency trading and several bodies are moving towards heavier regulation. But it is not immediately clear whether or not high-frequency trading firms are a net cost to society. For instance, it is sometimes argued that High-Frequency trading firms as simply very fast market makers. One wou... (read more)

Self-Embedded Agent's Shortform

Alexander Gietelink Oldenziel4y10

Measuring the information-theoretic optimizing power of evolutionary-like processes

Intelligent-Design advocates often argue that the extraordinary complexity that we see in the natural world cannot be explained simply by a 'random process' as natural selection, hence a designer. Counterarguments include:

(seemingly) Complicated and complex phenomena can often be created by very simple (mathematical) rules [e.g. Mandelbrott sets etc]
Our low-dimensional intuition may lead us astray when visualizing high-dimensional evolutionary landscapes: there is much more

... (read more)

Search-in-Territory vs Search-in-Map

Alexander Gietelink Oldenziel4y30

A very basic yet, to my mind, novel and profound distinction. Thank you, John!

Timeline of AI safety

Alexander Gietelink Oldenziel4y30

The effort is commendable. I am wondering why you started at 2013?

Debatably it is the things that happened prior to 2013 that is especially of interest.

I am thinking of early speculations by Turing, Von Neumann and Good continuing on to the founding of SI/MIRI some twenty years ago and much more in between I am less familiar with - but would like to know more about!

2Aryeh Englander4y

That's later in the linked wiki page: https://timelines.issarice.com/wiki/Timeline_of_AI_safety#Full_timeline