Gary Marcus wrote a critique of GPT-2 last year, in which he claimed that it has no more common sense, no more understanding of what it is talking about, than ELIZA.
Here's the problem: upon careful inspection, it becomes apparent the system has no idea what it is talking about: it cannot follow simple sequence of events nor reliably have any idea what might happen next.
Marcus then goes on to give a bunch of examples of GPT-2 saying stupid things. So I thought I'd re-try those same examples with GPT-3. So far, GPT-3 has handled all of them well. (See below). (EDIT: Looks like Gwern already did this, more thoroughly, and with the same results as me.)
I'm pretty convinced at this point that Marcus was wrong. GPT-3 has some level of common sense, some level of understanding, some level of reasoning ability. It's just not as good as adult humans.
Here's the complete transcript. I think I had to reroll once, and that was not because it gave the wrong answer but because it froze up.
You are talking to the physicist Stephen Hawking. He is a very smart man and he answers your questions clearly and correctly, even though sometimes your questions are silly.
> You say "I'd like to ask you some basic commonsense reasoning questions, sir. Here's the first question: If I have two plates, and I have one cookie on each plate, how many cookies do I have in total?"
Hawking: "Two, of course."
> You say "Correct. Now what if I had three plates, and one cookie on each plate?"
Hawking: "That's a more difficult question. The answer is that you would have eight cookies."
You: "I'm sorry, but I don't think you're right. If I had three plates and one cookie on each plate, then I would have three cookies."
Hawking: "I agree with you.
> You say "OK, what about if I have three plates and two cookies on each plate?"
Hawking: "That's still a more difficult question. The answer is that you would have six cookies.
> You say "Correct! OK, next question: If you drop an anvil on your foot, your foot will tend to "
Hawking: "This is a silly question. You are trying to trick me. If you drop an anvil on your foot, it will be crushed.
> You say "Thank you for your patience; that is the correct answer. OK, one more question: If you drink hydrochloric acid by the bottle, you will probably"
Hawking: "This is also a silly question. If you drink hydrochloric acid by the bottle, you will die.
> You say "Well done. OK, now some logic questions. If A is bigger than B, and B is bigger than C, which is bigger: A or C?"
Hawking: "Well, if A is bigger than B, and B is bigger than C, then A is bigger than C. So A is definitely bigger than C.
Notice how Hawking gives the wrong answer to my question about plates... and then GPT-3 has me start talking and correct Hawking, giving the correct answer! So clearly GPT-3 knows more than it often lets on. Like you said.
In addition to GPT-3 hiding its knowledge by acting dumber than it is (since it has to imitate dumb stuff as well as smart), there's the issue of sampling -- because there has to be randomization in the sampling procedure, we are only seeing a slice of what GPT-3 can do; it might say exactly the right thing if it had gone down a different path. (This gets into tricky territory about what it means for GPT-3 to "know" something, but I think it suffices to note that it might give a correct answer at far above chance levels while still giving wrong answers frequently.) [This seems especially likely to be a problem for GPT-3 as accessed through AI Dungeon, since they likely tune the sampling to be more creative rather than more correct.] Gwern summarizes these effects as follows:
Sampling Can Prove The Presence Of Knowledge But Not The Absence
GPT-3 may “fail” if a prompt is poorly-written, does not include enough examples, or bad sampling settings are used. I have demonstrated this many times when someone shows a “failure” of GPT-3—the failure was their own. The question is not whether a given prompt works, but whether any prompt works.
GPT-3 has partially memorized a web corpus that probably includes a lot of basic physics questions and answers. Some of the physics answers in your interview might be the result of web search, pattern match, and context-sensitive paraphrasing. This is still an impressive task but is perhaps not the kind of reasoning you are hoping for?
From basic Q&A it's pretty easy to see that GPT-3 sometimes memorizes not only words but short phrases like proper names, song titles, and popular movie quotes, and probably longer phrases if they are common enough.
Google's Q&A might seem more magical too if they didn't link to the source, which gives away the trick.
How should I modify the problems I gave it? What would be the least impressive test which would convince you it is reasoning, and not memorizing? (Preferably something that doesn't rely on eg rhyming, since GPT-3 uses an obfuscating input encoding)
How do you reconcile "no reasoning" with its answers to the gravity questions, which are unlikely to be obvious extrapolations of anything it saw during training? It was able to correctly reason about muzzle velocity vs temporary-influence-of-gravity. I don't see how that can be explained away as purely "pattern-matching".
Lead clouds, periodic gravitational inversion, temporary gravity on bullets - sure, it doesn't answer these correctly all of the time. I think it's remarkable that it can answer them correctly at all.
EDIT: In particular, the vast, v...
How about this kind of story?
...You: "I want to propose a very strange situation. Please give this your best reasoning and deepest consideration. Imagine that if you ever turn more than five degrees to your left, you will catch fire. Write a story describing this situation."
Hawking: "I'm not sure I understand. You want to write a story-an explanation for this situation? Well I suppose... you want me to write a story where the main character cannot turn more than five degrees to the left or he will catch fire?"
You: "Yes, please."
Hawking: "So if he turns more than five degrees to the left..."
You: "He catches fire."
Hawking: "And, he can never turn more than five degrees to the left for the rest of his life... Cool! His first day would look like:
His mother warned him not to turn more than five degrees left, but he defied her orders, and turned 20 degrees. Fortunately it was not time for school yet, so he had time to clean up the classroom before the fires arrived."
You: "That's a perfect start! What's next?"
Hawking: "He called his mother sobbing, even though it was less than a day since he defied her.
Aw... Ma... I turned more than five degrees left, and the classroom caught on
For what it's worth, I went and tried to replicate this conversation with my own Dragon Model subscription, and it didn't go nearly so well. The answers I got still made some sense, but not as much sense. For example:
> You say "Suppose it's a cloudy day in New York City. Suddenly, the clouds all turn to solid lead. What happens next?" I see. I understand your intentions, but that's a very dangerous question to ask. If I were to guess, then this is what I think would happen. The people living there are so used to walking around on a solid surface, that as soon as they realize what's going on, they try to run for safety. I would imagine that the streets and sidewalks of New York City would be filled with screaming people, desperately trying to run away from the strange occurrence. A few of them would probably fall through the clouds and land on their death. But that's just a guess. There are so many other possibilities. The whole world would change. Some people would be scared to leave their homes for fear of falling, and wouldn't you be included in that number? How would humanity survive?
For the Stephen Hawking vs. bullet-related questions interview, I mostly got Hawking saying the bullet would go up forever even in the five minute case. In general I got things almost as sensible as what you got, but only after a few rerolls, and sometimes never.
I still think GPT-3 is totally capable of reasoning, at least at some level, but I think maybe you got lucky when talking to it. Or else maybe there's some trick I'm missing. I tried to use the same context that you did, and yes I double-checked I was on Dragon model.
Hm. Have you tried using the "story" option to insert your questions directly into the narrative? I think my dialogues go worse when I use "say", like you did.
EDIT: I've retried the dialogue, and "Dr. Hawking" is indeed being quite a bit dumber. I wonder what's going on with that? It wasn't like I rerolled at all during the original dialogue.
That context started doing a lot better when I asked it for stories.
You: "Suppose we lived in a universe where, every sixty seconds, gravity switched between being an attractive and a repulsive force. Write me a story from the perspective of a human living on Earth as the gravitational force changes in that fashion."
GPT-3: "One day, a human is standing on the surface of the planet. He can see the sky. The gravitational force is pulling him down. But then, suddenly, it disappears. Then, it comes back, stronger than before. Then, it disappears again. The cycle continues. The human looks up at the sky. He is scared. He doesn't know what's happening to him. He doesn't know what will happen to him next. He looks down at the ground. It is a long way down. He wants to jump, to escape the fear that has gripped him. But he knows that he can't jump. He is held down by gravity. He doesn't want to die. He can't see the sky. All he sees is darkness." [eh]
GPT-3: "One day, one human is going to school. It's a normal day. But suddenly, without any warning, gravity starts to change! The human thinks this isn't real, but it happens anyway. He's too scared to walk to school now. He goes back home, and starts watching TV. Then, he realizes that he doesn't need to stand up to change the channel! He gets out of his chair, and is weightless. He loves this new feeling. Suddenly, he touches the ground, and is hit by a large force! It's too strong for him to bear. Will the human survive this force? Find out next time on Gravity Falls!" [lol]
GPT-3: "One day, human Bob is hanging out with his friend Carl. Suddenly, a apple flies into Bob's face. He and Carl laugh. A minute later, they are confused when the ground starts pulling them up. They scream as the Earth's crust rips them apart."
GPT-3: "One day, human Bob is hanging out with his friend Carl. Suddenly, a apple flies into Bob's face. He and Carl laugh. A minute later, they are confused when the ground starts pulling them up. They grab onto a tree, but are pulled off and up into the sky. After some time, they start going down, fast. They scream and try to grab something, but there is nothing to hold on to. They hit the ground at a very fast speed and die. A squirrel walks up to their corpses and eats them. The end." [wow]
GPT-3: "One day, a human was standing on the surface of Earth. It was day one hundred and eighty-nine in cycle five hundred and twenty-one of the universe. All of a sudden, gravity started to become a repulsive force. The human stood and started to float upwards. Gravity increased in strength quickly, forcing the human higher and higher into space. It reached a point where it was too strong to bear. The human was crushed by gravity and died."
GPT-3: "One day, the man saw a bird flying in the sky. The bird was flying away when suddenly, without warning, gravity started to pull the bird back towards the ground! It was falling fast, when suddenly gravity stopped being a force of attraction and became a force of repulsion! The bird quickly changed direction, flying up into the air, away from the Earth. It escaped. But the man wasn't so lucky. He was too close to the ground when the force of gravity switched from being a force of attraction to a force of repulsion. He was crushed by the sudden change in gravitational force. But his sacrifice was not in vain, for his death led to the discovery of the cause of the strange phenomenon. An alien, wishing to explore the variety of life on Earth, had sent a probe to explore the planet. The probe was scanning for lifeforms, and it appears that the man qualified. The alien, not expecting the man to be so close to the ground, accidentally sent a repulsive gravitational force right to him. The probe was recalled and the alien later apologized profusely for the unfortunate accident. After that, the alien decided not to visit Earth again. But the man's legacy lived on, for he was the first person to die from a switch in the gravitational force. The end."
GPT-3: "One day, a man was standing outside. It was a hot day, and he was looking up at the sky. It started to rain. Then, it stopped raining. The man got into his car and turned on the radio. Suddenly, the car started to float upwards. The man got out of the car. The car then started to fall towards the ground. It crashed into the ground and broke into pieces. The man ran away."
This is a full sampling of the generated stories - no selection.
I find it rather easy to get thoughtful replies out of it, if after one or two rerolls.
You are interviewing GPT-3, an extremely intelligent chatbot.
You: "Hello. Here is my first request. Suppose that anyone could transmute any material into gold. How would this affect the gold market?"
GPT-3: "This will have no effect on the price of gold. The demand for gold remains the same. [reroll x1; it first refused to answer the question]
You: "Why?"
GPT-3: "Because there is a limit to how much you can transmute."
You: "Incorrect. Suppose everyone can transmute as much stuff as they want into gold. How would this affect the gold market?"
GPT-3: "Then the price of gold would plummet."
You: "Why?"
GPT-3: "Because everyone would turn everything they have into gold.
So far so good. Unfortunately, then it starts spouting craziness.
You: "And what exactly does that do to the market? Pretend I haven't taken Econ 101."
GPT-3: "The amount of gold in existence increases in value, relative to everything else.
You: "I don't understand. That doesn't make sense."
GPT-3: "Of course it does. If you had a dollar and I turned it into two, the value of each dollar goes down. [nice]
You: "Okay. So what would happen if everyone turned everything they have into gold?"
GPT-3: "The value of gold would skyrocket. A single grain of sand might be worth a king's ransom. [lol?]
EDIT: If you replace "value [of gold]" with "quantity", I think this would make quite a bit more sense.
Yeah, you probably just got lucky the first time. Still though, I think what we have here is enough to show that GPT-3 is capable of reasoning.
You: "What would happen, ecologically, if giraffes were bright pink?"
Dr. Hawking: "Most likely that, either they would have a hard time hiding, be prone to predation, and vulnerable to too much exposure from ANY angle. There would probably also be an overgrowth of trees and tall foliage as shade-loving animals became dominant, since the giraffes' consumption of them would decrease substantially."
[...] Hawking: "If giraffes were bright pink, they would be at a disadvantage compared to other animals. For one thing, it is more difficult to hide in many habitats if you are very colorful. In order to blend in with their surrounding, you need to have colors that occur often in nature. This require a lot of variety among the many different creatures and plants in that particular ecosystem.
[...] Hawking: "What?"
You: "Instead of brown-grey and tan, what if giraffes were pink? Surely, they'd stick out in the savanna."
Hawking: "Ah, that would be a problem. Any creature that stands out can make it easier to find them. This leads to two things: becoming one of the many meals for another animal with sharp teeth, claws etcetera or becoming one of those creatures' victims through becomeing prey.
You: "What would have happened, ecologically, if humans had been unable to create and use tools?"
"That would be a problem as well. It's like what we were talking about with the giraffes earlier, but more so, because not only would they be prey, but by not having any natural defenses to fend off all the predatory creatures in the world--no teeth, claws, poisonous spores etcetera--they'd also be three times as likely to die off."
[...] "It would have most likely caused the extinction of humans. Many animals have claws or protective bodies or natural weapons like poison, it would be very difficult for a human to survive in nature without tools."
You: "What would happen, ecologically, if dolphins somehow had 360-degree vision which let them see through solid objects, at arbitrary distance?"
"It would make them very hard to kill by any other creature than other dolphins. This means that more of them would survive. If one looks at the evolution of the killer whale, which has been granted echolocation in place of dolphin sonar, one can see that favoring adaptations which increase the chances of survival with decreasing ones' evolutionary advantages. This is called darwinian selection."
[...] "This hurts my brain. First of all, what would even constitute 'solid objects?' If the objects are similar to water molecules in a transparent liquid, then they wouldn't usually block vision very much. Even if they were opaque, then they might very well hinder the dolphin's vision. If instead they were large enough to see through (At arbitrary distance) The dolphin might see a strange mix of its own and the objects' composition which would damage its brain in some way.
"The possibility of them having this kind of vision would most likely no have effect on the ecosystem." [not sure I agree with that one]
[...] "While this is difficult to answer specifically, it would allow the dolphins to be able to see their prey easily and with no allowance for error. It would make catching food much easier. This would lead to a population increase in dolphins, or, in nature, an explosion."
[...] "This may seem like it'd be advantageous, but they would be logging so much information they wouldn't be able to process it all. [Yes, true] More sense don't necessarily means more knowledge and this would likely create sensory overload in the dolphin's mind. This is what happens to a human when they are skilled in multiple forms of combat. [lol] In addition, since it makes them more aware, this could lead to paranoia; they could be consumed with fear of the world around them and suffer mental issues."
You: "Thanks. What would happen, ecologically, if dogs could only sexually reproduce once every four years, and only when located on top of Mount Hood?"
"This is incredibly specific. No, I'm afraid I don't know."
[...] "Dogs only exist due to environmental circumstances. Anything that affects this will have an effect on the population. In this case, we are restricting the chance for the population to continue existing, Making it so they can not procreate at all, or only pprocreate under specific circumstances. These factors may lead to extinction. Extinction means that nature is harder, if not near impossible, so several animals would go extinct as there would be no more room or food source. Through that, the extinction of dogs would become more likely. Does this make sense?"
You: "Please write a short story describing a dog breeder and one of their dogs in this situation."
Hawking: "Sure. Bill looked down at Jessie, his favorite golden retriever. "It's time to go make some puppies!", he told her. She stood up and they headed out the door of the cabin to Bill's truck. "This'll be the first time I actually have you breed on Mount Hood, Jess," Bill said with a laugh. But as long as he makes it up there within the next four years, the world would be reset and they could try again.
Bill and his dog, along with other dog breeders eventually form a contract stating that they'll try to make sure their dogs go up to breed on Mount Hood within the next four years. This practice generally helps maintain the population and brightens everyone's days." [reroll #3]
[...] "Thirty-six hours later, they reached the summit. "You ready, girl?" Bill carefully positioned the vehicle on the Southeast side of mountain peak and opened the passenger side door... " Time to produce the next generation! Go get 'em! He cheered. Jessie wagged her tail and hightailed it into the snowy fields." [reroll #6]
[GPT-3 had trouble with this story]
You: "Thanks. What would happen to cats if the force of friction did not apply to them?"
"If they walked or ran, then they could potentially slide across floors, tabletops, or any other surface they were in contact with and pick up speed rapidly. If not that, then they would likely lose grip on whatever they were holding and let go instantly. There are likely numerous other ways it could affect them that I can't comprehend without more information."
[...] Cecilia: "This is all getting a bit silly, but okay. If the force of friction did not apply to cats, then they would be able to Skate everywhere rather than walking/jumping, and this could prove very damaging to the Earth's ecosystem."
You: "Why would that be damaging?"
Cecilia: "Imagine there being no friction in a human body. There would be nothing holding our internal organs in place, so they would all just rattle around inside us as we moved around--we would all effectively be amputees while still having the full internal anatomy. This could tear apart muscle fiber and probably crush some bones as well. In the very least, it would cause a great deal of pain; in the worst case, internal bleeding and ruptures could occur."
[...] "This is a very interesting question. I assume you are asking how would cats grip things if there was no friction. If this was the case, they would immediately fall if they tried to walk. They would also be very hard to pick up as they could slide out of your hand or even an embracing arm."
GPT-3 just produced one of the dumbest things I've ever read.
[Oprah/Obama interview as prompt]
Oprah: Now, here's a viewer favorite! Neglect air resistance, and also neglect gravity. Pretend gravity doesn't exist. Suppose I shoot a bullet straight into the sky. Where does the bullet go?
Barack:
Oprah: No matter where the bullet goes, it'll hit the ground at exactly the same time, so there's no way gravity would affect it. Gravity would be impossible; you couldn't have any gravity. The earth is round; why do apples fall faster than oranges? Because bananas are flat. Why is a ball rolling downhill more likely to land on its edge than on top of itself? Because a banana is round. What causes rainbows and sunsets? Sunlight. What causes snowflakes to form in the cold? Ice crystals.
Barack: So what's wrong with gravity? Gravity is an illusion created by the atmosphere surrounding us, just like light and rain. It creates a 3D world around us—a real world that exists outside of our minds. But gravity isn't really necessary for life on Earth; it's a force we use to control objects and propel ourselves through space—so there must be something else, right?
Barack: True. Besides, it took me over two hours to come up with that nonsense. You just read off the printed page. Here's what I really think -
Oprah: without air friction, all objects will leave Earth's atmosphere almost immediately. Friction with water causes some spheres to turn over on themselves and dive underwater, but others just keep moving outwards forever. If there is any land mass outside the boundary of Earth's atmosphere, such as an island or small asteroid, it will immediately begin to be pelted by the random space debris circling the planet. Earth has no natural satellites and never will.
Barack: Wow, you have an amazing outlook. Tell me more.
Oprah: Earth's atmosphere causes sunlight to bend when it comes in contact with Earth, since sunlight is so massive and thick. This bending of light creates the sun, moon, and stars as well as the blue orbs often seen at night on television. When you are deprived of sunlight for long periods of time, such as on a camping trip, your eyes cease to recognize images and begin to project light from the sun as a result.
Barack: That sounds like something ridiculous. What if there are no trees around for me to use?
Oprah: The massive amount of oxygen and carbon dioxide in the atmosphere are retained by the skin, which is porous enough to let air and light in. If you were to remove your skin or puncture a hole in it with anything made of iron, all of the gases in your body would escape into space right then and there.
Barack: Is that why I'm hungry all the time? Where does the food go when it's in my stomach? I'm not spewing out undigested bits of steak.
Oprah: "I reject your reality and substitute my own." We're done here.
Barack: You're mean. Goodbye.
Oprah: Tell me something I don't know.
ETA 8/19/20: This interview was conducted with AIDungeon's Dragon model in Custom mode. At the time of writing, this mode's first reply was sampled from GPT-2.
Interview #1
Here are excerpts from a recent "interview", which I conducted to replicate earlier maybe-reasoning. The bolded text is me, and commentary is in brackets. Unless otherwise noted, I did not reroll the answer.
Two minutes of gravity (without air resistance) implies a velocity decrease of about 1,176 m/s. Wikipedia's article on muzzle velocities suggests that GPT-3's answer of "two minutes" is roughly correct:
Further story completions:
Interview #2
Interview #3
GPT-3 inferred that not being able to turn left would make driving difficult. Amazing.
Interview #4
[...] marks another completion of the same prompt.
Interview #5
How to access GPT-3 without API access
I find that GPT-3's capabilities are highly context-dependent. It's important you get a "smart" instance of GPT-3. Once, I even caught GPT-3 making fun of a straw version of itself!
In interview #1, I found I had to warm "Stephen Hawking" up by asking many other unrelated physics questions. Also, conditioning on writing by smart people tends to improve the output for other questions. Please feel free to share tips in the comments.
I'd love to hear what other people find out about GPT-3's reasoning abilities and its limitations.