1 min read

4

This is a special post for quick takes by gwern. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
9 comments, sorted by Click to highlight new comments since:
[-]gwern2727

Warning for anyone who has ever interacted with "robosucka" or been solicited for a new podcast series in the past few years: https://www.tumblr.com/rationalists-out-of-context/744970106867744768/heads-up-to-anyone-whos-spoken-to-this-person-i

[-]gwern2533

Should you write text online now in places that can be scraped? You are exposing yourself to 'truesight' and also to stylometric deanonymization or other analysis, and you may simply have some sort of moral objection to LLM training on your text.

This seems like a bad move to me on net: you are erasing yourself (facts, values, preferences, goals, identity) from the future, by which I mean, LLMs. Much of the value of writing done recently or now is simply to get stuff into LLMs. I would, in fact, pay money to ensure Gwern.net is in training corpuses, and I upload source code to Github, heavy with documentation, rationale, and examples, in order to make LLMs more customized to my use-cases. For the trifling cost of some writing, all the worlds' LLM providers are competing to make their LLMs ever more like, and useful to, me.

And that's just today! Who knows how important it will be to be represented in the initial seed training datasets...? Especially as they bootstrap with synthetic data & self-generated worlds & AI civilizations, and your text can change the trajectory at the start. When you write online under stable nyms, you may be literally "writing yourself into the future". (For example, apparently, aside from LLMs being able to identify my anonymous comments or imitate my writing style, there is a "Gwern" mentor persona in current LLMs which is often summoned when discussion goes meta or the LLMs become situated as LLMs, which Janus traces to my early GPT-3 writings and sympathetic qualitative descriptions of LLM outputs, where I was one of the only people genuinely asking "what is it like to be a LLM?" and thinking about the consequences of eg. seeing in BPEs. On the flip side, you have Sydney/Roose as an example of what careless writing can do now.) Humans don't seem to be too complex, but you can't squeeze blood from a stone... ("Beta uploading" is such an ugly phrase; I prefer "apotheosis".)

This is one of my beliefs: there has never been a more vital hinge-y time to write, it's just that the threats are upfront and the payoff delayed, and so short-sighted or risk-averse people are increasingly opting-out and going dark.

If you write, you should think about what you are writing, and ask yourself, "is this useful for an LLM to learn?" and "if I knew for sure that a LLM could write or do this thing in 4 years, would I still be doing it now?"


...It would be an exaggeration to say that ours is a hostile relationship; I live, let myself go on living, so that Borges may contrive his literature, and this literature justifies me. It is no effort for me to confess that he has achieved some valid pages, but those pages cannot save me, perhaps because what is good belongs to no one, not even to him, but rather to the language and to tradition. Besides, I am destined to perish, definitively, and only some instant of myself can survive in him. Little by little, I am giving over everything to him, though I am quite aware of his perverse custom of falsifying and magnifying things.

...I shall remain in Borges, not in myself (if it is true that I am someone), but I recognize myself less in his books than in many others or in the laborious strumming of a guitar. Years ago I tried to free myself from him and went from the mythologies of the suburbs to the games with time and infinity, but those games belong to Borges now and I shall have to imagine other things. Thus my life is a flight and I lose everything and everything belongs to oblivion, or to him.

Reply72111
[-]gwern2414

We know that "AI is whatever doesn't work yet". We also know that people often contrast AI (or DL, or LLMs specifically) derogatorily with classic forms of software, such as regexps: why use a LLM to waste gigaflops of compute to do what a few good regexps could...?

So I am amused to discover recently, by sheer accident while looking up 'what does the "regular" in "regular expression" mean, anyway?', that it turns out that regexps are AI. In fact, they are not even GOFAI symbolic AI, as you immediately assumed on hearing that, but they were originally connectionist AI research! Huh?

Well, it turns out that 'regular events' were introduced by Kleene himself with the justification of modeling McCulloch-Pitts neural nets! (Which are then modeled by 'regular languages' and conveniently written down as 'regular expressions', abbreviated to 'regexps' or 'regexes', and then extended/bastardized in countless ways since.)

The 'regular' here is not well-defined, as Kleene concedes, and is a gesture towards modeling 'regularly occurring events' (that the neural net automaton must process and respond to). He admits "regular" is a terrible term, but no one came up with anything better, and so we're stuck with it.

I have some long comments I can't refind now (weirdly) about the difficulty of investing based on AI beliefs (or forecasting in general): similar to catching falling knives, timing is all-important and yet usually impossible to nail down accurately; specific investments are usually impossible if you aren't literally founding the company, and indexing 'the entire sector' definitely impossible. Even if you had an absurd amount of money, you could try to index and just plain fail - there is no index which covers, say, OpenAI.

Apropos, Matt Levine comments on one attempt to do just that:

Today the Wall Street Journal has a funny and rather cruel story about how SoftBank Group went all-in on artificial intelligence in 2018, invested $140 billion in the theme, and somehow … missed it … entirely?

The AI wave that has jolted up numerous tech stocks has also had little effect on SoftBank’s portfolio of publicly traded tech stocks it backed as startups—36 companies including DoorDash and South Korean e-commerce company Coupang.

This is especially funny because it also illustrates timing problems:

SoftBank missed out on huge gains at AI-focused chip maker Nvidia: The Tokyo-based investor put around $4 billion into the company in 2017, only to sell its shares in 2019. Nvidia stock is up about 10 times since.

Oops. EDIT: this is especially hilarious to read in March 2024, given the gains Nvidia has made since July 2023!

Part of the problem was timing: For most of the six years since Son raised the first $100 billion Vision Fund, pickings were slim for generative AI companies, which tended to be smaller or earlier in development than the type of startup SoftBank typically backs. In early 2022, SoftBank nearly completely halted investing in startups when the tech sector was in the midst of a chill and SoftBank was hit with record losses. It was then that a set of buzzy generative AI companies raised funds and the sector began to gain steam among investors. Later in the year, OpenAI released ChatGPT, causing the simmering interest in the area to boil over. SoftBank’s competitors have spent recent months showering AI startups with funding, leading to a wide surge in valuations to the point where many venture investors warn of a growing bubble for anyone entering the space.

Oops.

Also, people are quick to tell you how it's easy to make money, just follow $PROVERB, after all, markets aren't efficient, amirite? So, in the AI bubble, surely the right thing is to ignore the AI companies who 'have no moat' and focus on the downstream & incumbent users and invest in companies like Nvidia ('sell pickaxes in a gold rush, it's guaranteed!'):

During the years that SoftBank was investing, it generally avoided companies focused specifically on developing AI technology. Instead, it poured money into companies that Son said were leveraging AI and would benefit from its growth. For example, it put billions of dollars into numerous self-driving car tech companies, which tend to use AI to help learn how humans drive and react to objects on the road. Son told investors that AI would power huge expansions at numerous companies where, years later, the benefits are unclear or nonexistent. In 2018, he highlighted AI at real-estate agency Compass, now-bankrupt construction company Katerra, and office-rental company WeWork, which he said would use AI to analyze how people communicate and then sell them products.

Oops.

tldr: Investing is hard; in the future, even more so.

"Gwern, why don't you just buy an AI-themed ETF and 'buy the whole sector' if investing in individual stonks is so hard but you're optimistic about its long-term value?"

"How to Lose Money on the World’s Most Popular Investment Theme: Pity the investors in the three artificial-intelligence-themed ETFs that managed to lose money this year" (mirror):

There are lots of embarrassing ways to lose money, but it is particularly galling to lose when you correctly identify the theme that will dominate the market and manage to buy into it at a good moment.

Pity the investors in the three artificial-intelligence-themed exchange-traded funds that managed to lose money this year. Every other AI-flavored ETF I can find has trailed both the S&P 500 and MSCI World. That is before the AI theme itself was seriously questioned last week, when investor doubts about the price of leading AI stocks Nvidia and Super Micro Computer became obvious.

The AI fund disaster should be a cautionary tale for buyers of thematic ETFs, which now cover virtually anything you can think of, including Californian carbon permits (down 15% this year), Chinese cloud computing (down 21%) and pet care (up 10%). Put simply: You probably won’t get what you want, you’ll likely buy at the wrong time and it will be hard to hold for the long term.

Ironically enough, Nvidia’s success has made it harder for some of the AI funds to beat the wider market. Part of the point of using a fund is to diversify, so many funds weight their holdings equally or cap the maximum size of any one stock. With Nvidia making up more than 6% of the S&P 500, that led some AI funds to have less exposure to the biggest AI stock than you would get in a broad index fund. This problem hit the three losers of the year. First Trust’s $457 million AI-and-robotics fund has only 0.8% in Nvidia, a bit over half what it holds in cybersecurity firm BlackBerry. WisdomTree’s $213 million AI-and-innovation fund holds the same amount of each stock, giving it only 3% in Nvidia. BlackRock’s $610 million iShares Future AI & Tech fund was also equal weighted until three weeks ago, when it altered its purpose from being a robotics-and-AI fund, changed ticker and switched to a market-value-based index that gives it a larger exposure to Nvidia.

The result has been a 20-percentage-point gap between the best and worst AI ETFs this year. There is a more than 60-point gap since the launch of ChatGPT in November 2022 lit a rocket under AI stocks—although the ETFs are at least all up since then.

...Dire timing is common across themes: According to a paper last year by Prof. Itzhak Ben-David of Ohio State University and three fellow academics, what they call “specialized” ETFs lose 6% a year on average over their first five years due to poor launch timing.

...But mostly, look at the fees: They will be many times higher than a broad market index fund, and the dismal history of poor timing suggests that for most people they aren’t worth paying.

Masayoshi Son reflects on selling Nvidia in order to maintain ownership of ARM etc: https://x.com/TheTranscript_/status/1805012985313903036 "Let's stop talking about this, I just get sad."

So among the most irresponsible tech stonk boosters has long been ARK's Cathy Woods, whose antics I've refused to follow in any detail (except to periodically reflect that in bull markets the most over-leveraged investors always look like geniuses); so only today do I learn that beyond the usual stuff like slobbering all over TSLA (which has given back something like 4 years of gains now), Woods has also adamantly refused to invest in Nvidia recently and in fact, managed to exit her entire position at an even worse time than SoftBank did: "Cathie Wood’s Popular ARK Funds Are Sinking Fast: Investors have pulled a net $2.2 billion from ARK’s active funds this year, topping outflows from all of 2023" (mirror):

...Nvidia’s absence in ARK’s flagship fund has been a particular pain point. The innovation fund sold off its position in January 2023, just before the stock’s monster run began. The graphics-chip maker’s shares have roughly quadrupled since.

Wood has repeatedly defended her decision to exit from the stock, despite widespread criticism for missing the AI frenzy that has taken Wall Street by storm. ARK’s exposure to Nvidia dated back 10 years and contributed significant gains, the spokeswoman said, adding that Nvidia’s extreme valuation and higher upside in other companies in the AI ecosystem led to the decision to exit.

2-of-2 escrow: what is the exploding Nash equilibrium? Did it really originate with NashX? I've been looking for the history & real name of this concept for years now and have failed to refind it. Anyone?