Leela Zero uses MCTS, it doesnt play superhuman in one forward pass (like gpt-4 can do in some subdomains) (i think, didnt find any evaluations of Leela Zero at 1 forward pass), and i'd guess that the network itself doesnt contain any more generalized game playing circuitry than an llm, it just has good intuitions for Go.
Nit:
Subjectively there is clear improvement between 7b vs. 70b vs. GPT-4, each step 1.5-2 OOMs of training compute.
1.5 to 2 OOMs? 7b to 70b is 1 OOM of compute, adding in chinchilla efficiency would make it like 1.5 OOMs of effective...
If automating alignment research worked, but took 1000+ tokens per researcher-second, it would be much more difficult to develop, because you'd need to run the system for 50k-1M tokens between each "reward signal or such". Once it's 10 or less tokens per researcher second, it'll be easy to develop and improve quickly.
Externalized reasoning models suffer from the "legibility penalty" - the fact that many decisions are easier to make than to justify or explain. I think this is a significant barrier for authentic train of thought competitiveness, although not for particularly legible domains, such as math proofs and programming (Illegible knowledge goes into math proofs, but you trust the result regardless so it's fine).
Another problem is that standard training procedures only incentivize the model to use reasoning steps produced by a single human. This means, for instanc...
interesting, this actually changed my mind, to the extent i had any beliefs about this already. I can see why you would want to update your prior, but the iterated mugging doesn't seem like the right type of thing that should cause you to update. My intuition is to pay all the single coinflip muggings. For the digit of pi muggings, i want to consider how different this universe would be if the digit of pi was different. Even though both options are subjectively equally likely to me, one would be inconsistent with other observations or less likely or have something wrong with it, so i lean toward never paying it