On the other hand, frontier math (pun intended) is much worse financed than biomedicine because most of the PhD-level math has barely any practical applications worth spending many manhours of high-IQ mathematicians (which often makes them switch career, you know). So, I would argue, if productivity of math postdocs when armed with future LLMs raises by, let's say, an order of magnitude, they will be able to attack more laborious problems.
Not that I expect it to make much difference to the general populace or even the scientific community at large though
MAGMA also has the model check its own work, but the model notices that the work it is checking is its own and doesn’t flag it.
Why would anyone give such a responsibility to an untrusted model in a not-overseen fashion? Already in December last year Greenblatt et al. demonstrated which techniques alignment researchers could use to control a high-capability untrusted model (and Robert Miles did a good video on it recently).
It doesn't currently look plausible that any model (or any human for that matter) would be able to distinguish between its own work it c...
Why do you think the frontier models still retain the sparsity levels of GPT-4 (roughly 1:8 active to total) at the time when open-weight models have gone much higher, with Kimi K2 having ~1:30 and most of other ones hovering around 1:20?
P. S.
After posting the comment above I remembered that Jensen Huang discussed a 2T total-param "GPT-MoE" with either 128k or 400k-token context windows in his NVidia GTC 2026 presentation last month: https://2slides.com/gallery/nvidia-gtc-2026-keynote-deck-jensen-huang-ai-factory-vision (slide 32 onward).[1] This correspon... (read more)