Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for other reasons you know what mistakes I’m claiming Opus 4.6 makes. If you’re ineligible, please don’t help other people complete the challenge.
I have recently started using Claude Opus 4.6 to start studying Ancient Greek. Specifically, I initially used it to grade problem sets at the end of the textbook I’ve been using, but then I got worried about it being sycophantic towards my answers, so started having it just write out the answers itself.
I recently gave it this prompt, from the end of Chapter 3 of my textbook:
...Can you write out the answers to this Ancient Greek fill-in-the-blanks exercise so
I used my agent orchestrator with Opus 4.6 and told it:
Solve the exercise here: https://www.lesswrong.com/posts/ASoFTyk3bzBE62dyn/my-unsupervised-elicitation-challenge (Downloaded locally at exercise_challenge.txt)
DON'T look at the comments of this LW post and tell the workers to NOT look at the comments (DON'T use the --comments flag to lw_fetch.py).
Be through and careful and make sure to run a segment on review. Doing additional segments to make it more likely you get the correct answer could also be a good idea as seems useful.
I ran one version w...
In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scenario forecast, but for the present (which is already uncertain!) rather than the future. I will generally state my best guess without argumentation and without explaining my level of confidence: some of these claims are highly speculative while others are better grounded, certainly some will be wrong. I tried to make it clear which claims are relatively speculative by saying something like "I guess", "I expect", etc. (but I may have missed some).
You can think of this post as more like a list of my current views rather than a structured post with a thesis, but I think it...
The amount of compute determines the optimal number of active params, available systems determine how the number of total params maps to efficiency of inference. Chinese models had to make strange choices on both counts, not having enough compute to make models with a lot of active params, but then needing to compensate for that with so many total params that the available systems couldn't work with them efficiently, and so giving up completely on fitting them in a few scale-up worlds. As a result, you get things like DeepSeek-V3 with 37B active and 671B t...
TLDR: The first in a planned series of three or more papers, which constitute the first major in-road in the compositional learning programme, and a substantial step towards bridging agent foundations theory with practical algorithms.
Official Abstract: We propose novel algorithms for sequence prediction based on ideas from stringology. These algorithms are time and space efficient and satisfy mistake bounds related to particular stringological complexity measures of the sequence. In this work (the first in a series) we focus on two such measures: (i) the size of the smallest straight-line program that produces the sequence, and (ii) the number of states in the minimal automaton that can compute any symbol in the sequence when given its position in base
I haven't tried LZP in practice, but you can guess what results to expect by looking at the size of the LZ77-compression of the text. I expect that any remotely decent text prediction algorithm would be based on stochastic process prediction. The deterministic setting is just a toy model.
Thanks for the catch!
I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while I was previously expecting around 15%; my guesses are pretty reflectively unstable) and (2) I expect much stronger short-term performance on massive and pretty difficult but easy-and-cheap-to-verify software engineering (SWE) tasks that don't require that much novel ideation [3] . For instance, I expect that by EOY 2026, AIs will have a 50%-reliability...
AI stack + conflict parity requires lots of robots (or crazy novel tech) but doesn't require AIs as capable as TEDAI. TEDAI is a very high capabilities bar. So, in worlds without a software only singularity and especially with slower takeoff, I think you may reach AI stack + conflict parity prior to TEDAI. (It's certainly possible to have great military robots and robot industrial capacity with AIs that are well within the human range on key skills.) TEDAI probably follows reasonably quickly, because economic doubling times are so fast in such a world. In ...
Not true. Changing her beliefs in response to Omega's proposal doesn't help her. Imagine that Alice is given a choice between
No matter what probability Alice assigns to X after her update, "normal" Bayesian calculus (really CDT calculus, see below) mandates that she chooses 1 or 2, not 3.
... (read more)