Donald Hobson

MMath Cambridge. Currently studying postgrad at Edinburgh.

Sequences

Neural Networks, More than you wanted to Show
Logical Counterfactuals and Proposition graphs

Wiki Contributions

Comments

Sorted by

The Halting problem is a worst case result. Most agents aren't maximally ambiguous about whether or not they halt. And those that are,  well then it depends what the rules are for agents that don't halt. 

There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement "if I cooperate, then they cooperate" and cooperating if they found a proof.

(Ie searching all proofs containing <10^100 symbols)

There is a model of bounded rationality, logical induction. 

Can that be used to handle logical counterfactuals?

I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q;

 

And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn't perfect. A random 0.001% of neurons are deleted. Also, you know you aren't a copy. How would you calculate that probability p,q? Even in principle.

If two Logical Decision Theory agents with perfect knowledge of each other's source code play prisoners dilemma, theoretically they should cooperate. 

LDT uses logical counterfactuals in the decision making.

If the agents are CDT, then logical counterfactuals are not involved.

We can discuss anything that exists, that might exist, that did exist, that could exist, and that could not exist. So no matter what form your predict-the-next-token language model takes, if it is trained over the entire corpus of the written word, the representations it forms will be pretty hard to understand, because the representations encode an entire understanding of the entire world.

 

 

Perhaps. 

Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python. 

Ask this pyGPT to play chess, and it will play chess. Look under the hood, and you see a chess engine programmed in. Ask it to solve algebra problems, a symbolic algebra package is in there. All in the best neat and well commented code.

Ask it to compose poetry, and you have some algorithm that checks if 2 words rhyme. Some syllable counter. Etc. 

Rot13 is done with a hardcoded rot13 algorithm. 

Somewhere in the algorithm is a giant list of facts, containing "Penguins  Live In Antarctica".  And if you change this fact to say "Penguins Live in Canada", then the AI will believe this. (Or spot it's inconsistency with other facts?) 

And with one simple change, the AI believes this consistently. Penguins appear when this AI is asked for poems about canada, and don't appear in poems about Antarctica. 

When asked about the native canadian diet, it will speculate that this likely included penguin, but say that it doesn't know of any documented examples of this. 

Can you build something with ChatGPT level performance entirely out of human comprehensible programmatic parts?

Obviously having humans program these parts directly would be slow. (We are still talking about a lot of code.) But if some algorithm could generate that code? 

But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you

 

Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it. 

 

Secondly I think there is some sort of slight of had here. 

ChatGPT isn't yet fully general. Neither is a 3-sat solver.  3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.

In the infinite limit, both types of intelligence can simulate the other at huge overhead, In practice, they can't. 

 

Also, non-connectionist forms of intelligence are hard to evolve, because evolution works in small changes. 

Physics Myths vs reality.

Myth: Ball bearings are perfect spheres. 

Reality: The ball bearings have slight lumps and imperfections due to manufacturing processes.

Myth: Gravity pulls things straight down at 9.8 m/s/s.

Reality: Gravitational force varies depending on local geology.

 

You can do this for any topic. Everything is approximations. The only question is if they are good approximations.

If AI labs are slamming on the recursive self improvement ASAP, it may be that Autonomous Replicating Agents are irrelevant. But that's a "ARA can't destroy the world if AI labs do it first" argument. 

ARA may well have more compute than AI labs. Especially if the AI labs are trying to stay within the law, and the ARA is stealing any money/compute that it can hack it's way into. (Which could be >90% of the internet if it's good at hacking. )

there will be millions of other (potentially misaligned) models being deployed deliberately by humans, including on very sensitive tasks (like recursive self-improvement).

Ok. That's a world model in which humans are being INCREDIBLY stupid. 

If we want to actually win, we need to both be careful about deploying those other misaligned models, and stop ARA.

Alice: That snake bite looks pretty nasty, it could kill you if you don't get it treated.

Bob: That snake bite won't kill me, this hand grenade will. Pulls out pin. 

   If you can put uploaded human-level agents with evolved-organism preferences in your simulations, you can just win outright (eg by having them spend subjective millennia doing FAI research for you). If you can’t, that will be a very obvious difference between your simulations and the real world.

 

I disagree. If your simulation is perfectly realistic, the simulated humans might screw up at alignment and create an unfriendly superintelligence, for much the same reason real humans might.

Also, if the space of goals that evolution + culture can produce is large, then you may be handing control to a mind with rather different goals.Rerolling the same dice won't give the same answer.

These problems may be solvable, depending on what the capabilities here are, but they aren't trivial.

Taking IID samples can be hard actually. Suppose you train an LLM on news articles. And each important real world event has 10 basically identical news articles written about it. Then a random split of the articles will leave the network being tested mostly on the same newsworthy events that were in the training data. 

This leaves it passing the test, even if it's hopeless at predicting new events and can only generate new articles about the same events. 

When data duplication is extensive, making a meaningful train/test split is hard. 

If the data was perfect copy and paste duplicated, that could be filtered out. But often things are rephrased a bit. 

Load More