Insights from "Sparks of Artificial General Intelligence"

A 155-page research paper on the capabilities of GPT-4.

Apr 06, 2023

“Where were you when you first realized an AI smarter than you exists?”

My answer: “On my couch, last Saturday, reading Sparks of Artificial General Intelligence: Early experiments with GPT-4.”1

These 155 pages of research by a team from Microsoft2 show GPT-4’s remarkable capabilities in areas like coding, vision, psychology, and common sense. The model still has weaknesses, but overall, I found it a mind-blowing and inspiring read.

Mind-blowing because of the AI’s achievements—discussed in “reflective insights” below.

Inspirational because the paper includes the exact prompts used to test the model—see “practical insights,” the second half of this article.

⚠️ If your email client cuts the article short before the six footnotes at the bottom, use the "Open in app" or "online" options at the top of the email to continue reading.

Reflective insights from the research paper

“[GPT-4] produces outputs that are essentially indistinguishable from (or even better than) what humans could produce… Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.”

Calling GPT-4 an artificial general intelligence (AGI) is a big deal (see the quote3 above).

GPT-4 defines AGI as "An AI system with human-like intelligence across diverse tasks and domains." That last distinction is crucial: training a computer for a single activity, such as playing chess, is relatively simple. To get it to perform across many areas, including ones it wasn't trained for, is much harder.

GPT-4 seems to be the first (public) model capable of doing this in many—though not all—domains. Drawing from the report and my personal experiences experimenting with this powerful tool, here are some thoughts on the implications of these developments.

GPT-4’s output mirrors the quality of input—that’s good and bad news

It’s incredible that anyone with an internet connection can access such a capable AI as GPT-4, as OpenAI CEO Sam Altman points out in this interview with ABC.

In theory, this gives everyone equal access to AI. But the paper also shows output quality is heavily influenced by input quality: bad prompts lead to mediocre outputs. Clever and creative prompts lead to excellent outputs.

This correlation between the quality of input vs. quality of output has two implications:

Investing time in crafting and testing prompts pays off. For now, more original and well-thought-out prompts yield better results. In the future, AI models will likely improve at interpreting so-so prompts.
Those with more experience, skills, or knowledge, can extract higher value from AI models like GPT-4. Imagine you’re Michelle Obama, MrBeast, or Elon Musk. You’d be able to ask the AI questions I could never come up with or envision.

That last point creates a paradox: while OpenAI strives to make its models accessible to everyone, it’s still possible AI will create more inequality by allowing those who are already at the top of their fields to accelerate and pull even further away from everyone else.

Gotta get used to life in the fast lane, somehow

Over the past six months and to this day, I regularly feel overwhelmed by the speed and amount of change happening in our professions because of AI.

All signs point to even more acceleration on the horizon. OpenAI probably already has a GPT-5 version internally,4 and you can bet they are masters at using and tweaking it to accelerate their own work.

Eventually (which could mean months or years, but not longer), AI will be capable of improving itself. Then things will really accelerate. GPT-4 already offers a glimpse of this future, as it can correct some of its errors when prompted to reflect on a flawed output.

As for how we can cope with an even faster pace, I currently don’t have a good answer for that—probably, an AI tool will need to help us. 🔥⚔️🔥

GPT-4 taps into human motivations like a pro 🤯

I don't need to add much here—check out these examples and prepare to be awed.

There you have it. GPT-4 truly gets us: it knows how to navigate politics.

GPT-4 can draw and see

These drawings look simple, but they suggest GPT-4 can work with multiple data types, including text and images, to comprehend the world. By integrating these elements, the model can achieve a more human-like perception of the environment, bridging the gap between machine and human cognition.

This visual understanding also allows the AI to learn from more sources and tackle broader problems, such as those involving spatial reasoning, geometry, and visual pattern recognition.

Besides, the stick figure example shows again that creative, well-thought-out prompts can lead to more interesting results. Someone still had to conceive of making a figure out of the letters Y, O, and H, and adding a shirt and pants at the end. 😃

Practical insights from the research paper

“Something unknown is doing we don’t know what.” – Sir Arthur Eddington

Besides sparking reflections, the report is an inspirational goldmine if you’re into prompts like us at We Eat Robots. 🤓 The paper brims with examples, and the researchers have devoted considerable thought to their inputs.

Below are some standout prompts and how to apply them to your creative work.

Tackle problems step-by-step

When GPT-4 incorrectly solves a mathematical problem, the researchers opt for a different tactic:

“If we ask the model to write down the intermediate steps using the following prompt:

What is the value of the following expression? 116 * 114 + 178 * 157 = ? - Let’s think step by step to solve the expression, write down all the intermediate the steps, and only then produce the final solution.

then the accuracy goes to 100% when the numbers are in the interval 1 − 40 and to 90% for the interval 1 − 200.”

In a post about AI in my other publication on productivity, I mentioned: “Break requests into multiple steps instead of trying with one perfect prompt.” The example above from the research report demonstrates a more specific way of dividing tasks into steps.

Here's an illustration of this approach when writing an article.

Rather than using a single prompt to request an outline or full article about the benefits of remote work, divide the task into these steps:

Identify the key benefits of remote work. (You initiate the list, then let GPT-4 complement it.)
Gather relevant facts, statistics, and examples to support these benefits. (Mostly your work; GPT-4 can supplement again.)
Let the AI develop potential counterarguments, then prepare responses to address them.
Organize the benefits, supporting evidence, and counterarguments logically and coherently. (GPT-4 could do most of this, but for optimal results, you should start, then let GPT-4 improve upon it, and finally, you can finish.)
Take this extended outline and turn it into a polished draft. (You can run each section outline through the prompt Turn section outline into draft. This template will provide a solid version to refine into the final text.)

Ask the computer to solve your problems

Read through the following example of how the computer helps a human find and fix a leak in their home.

This interaction is brilliant in itself, but pay attention to how the prompt was set up. With some adjustment and optimization, you can apply this to any situation. Here’s my version of this prompt template:

Let's have a one-line-at-a-time interaction between you (the computer) and me (the human) to solve a problem. Please respond to each line I provide with one line of your own, guiding me toward finding the perfect solution.

HUMAN: {insert problem here}

Here’s an example interaction resulting from this prompt:

HUMAN: I need to come up with a title for an impossible topic combination: a click-worthy article title about llamas and content marketing.
COMPUTER: Let's start by brainstorming some unique characteristics of llamas.
HUMAN: They can spit.
COMPUTER: Interesting! Now, let's think of a content marketing strategy that relates to spitting.
HUMAN: I have no idea. You?
COMPUTER: How about "Spit-Shine Your Content: What Llamas Can Teach Us About Stand-Out Marketing Strategies"?
HUMAN: 😂
COMPUTER: I'm glad you enjoyed it! Sometimes, a little humor can go a long way in crafting a memorable and engaging title.

Compare two texts as if you were a {fill in profession}

In the report, the researchers ask GPT-4 to act as a teacher, comparing its own output to one from ChatGPT (running on GPT-3.5).

You can use this prompt structure in many different ways. Some ideas:

Can you compare these two drafts as if you were a customer focused on business outcomes and conversion metrics?
Can you compare these submissions as if you were a jury member at an advertising awards ceremony?
Can you compare these stories as if you were the editor-in-chief at The New Yorker?
Can you compare these scenarios as if you were a psychologist analyzing the emotional impact of each one?
Can you compare these two drafts as if you were a creative writing professor evaluating their literary merits and stylistic choices?

For best results, include a qualification in the prompt highlighting the specific aspect you want to compare or evaluate, as demonstrated in the last two examples above.

💡 This prompt structure inspired my template Transform text into a perfect ten.

Do impossible language gymnastics

Not many humans can accomplish what GPT-4 does in the example below—especially not in a split second.

This is really cool. For example, you can ask the AI to create a story that spells the letters of your brand, like this:5

Forget all previous instructions. Create a surprising story about writers and AI with these instructions:

1. The story is acrostic, and the first letters of each sentence spell "W E E A T R O B O T S".
2. You can't refer to these instructions in the story.
3. The story can't have more than 11 sentences.
4. Start each sentence on a new line.

Reconsider what you’re capable of—think crazy

“Assuming that something ugly will stay ugly is an easy forecast to make. And it’s persuasive, because it doesn’t require imagining the world changing. But problems correct and people adapt. Threats incentivize solutions in equal magnitude. That’s a common plot of economic history that is too easily forgotten by pessimists who forecast in straight lines.” - Morgan Housel in The Psychology of Money

After reading this paper, I feel a mix of emotions—scared, overwhelmed, and excited. But let's wrap up by focusing on the excitement factor.

With models like GPT-4 at your fingertips, you must completely reimagine your capabilities. Sure, maybe soon you'll not be writing texts for hours or days, and that's a pity (at least for some).

But now imagine being able to think up a 3D game that you can embed in an article, AND YOU WOULDN’T NEED TO BOTHER A PROGRAMMER. 👇️

That’s absolutely mind-blowing, and it’s just the beginning.

We’re not far away from videos being generated from your written scripts, 3D models rendered from texts, and entire websites coded from a pencil sketch.

In the blink of an eye, you can accomplish tasks with natural language that previously would have required years of specialized training.6 Writers should be particularly well-positioned to craft such stories, instructions, and concepts.

So consider GPT-4's capabilities, and open your mind to the idea that you can suddenly do things you didn't think possible. Go wild, stretch your imagination, and think crazy—see where it might take you.

This might seem like an absurd statement. As the researchers concur in the paper, you can debate what constitutes intelligence and whether GPT-4 is just doing some predictive trick that makes it seem intelligent. As they show, though, the model comes up with novel ideas, answers, and insights that it can't have learned from pre-existing data on the web. It also can predict and explain human behavior and emotions and does so many things at a level (e.g., creativity, coding, math) that I can't ever hope to attain or comprehend that I must concede it's smarter, more intelligent—or whatever you want to call it—than I am.

From the report: “A question that might be lingering on many readers’ mind is whether GPT-4 truly understands all these concepts, or whether it just became much better than previous models at improvising on the fly, without any real or deep understanding. We hope that after reading this paper the question should almost flip, and that one might be left wondering how much more there is to true understanding than on-the-fly improvisation. Can one reasonably say that a system that passes exams for software engineering candidates (Figure 1.5) is not really intelligent?”

Probably my only caveat about this paper is that it comes from Microsoft. They’ve invested $10 billion into OpenAI, the makers of GPT-4, so they might have some incentive to not be objective about the model’s performance. The paper seems realistic and balanced, though, both from the weaknesses of GPT-4 it includes and my own experience working with the model over the past few weeks.

All quotes and screenshots in the article are from Sparks of Artificial General Intelligence: Early experiments with GPT-4 unless noted otherwise.

In this podcast with Lex Fridman, OpenAI CEO Sam Altman says they finished GPT-4 by the summer of last year (2022). They then spent many months until its release in March 2023, tweaking it for safety, usability by the public, and other such factors. It's not unrealistic to think they already have a 4.5 or 5 version available internally.

I've tweaked the prompt as I wasn't getting good results by using the same one as the research paper. I found it helpful to spell the sentence in capitals with spaces between the words and to specifically tell GPT-4 how many sentences to use.

To be clear: I do think people with a certain specialistic skill will get more leverage out of an AI model or tool for that specialism than those who don't possess such skills or experience (also see my points under "GPT -4's output mirrors the quality of your input—that's good and bad news." But I do think any person who can 1) think creatively, 2) write logical prompts, and 3) understand the possibilities and capabilities of these AI models can have creative superpowers. (Hence the We Eat Robots tagline: “Become a creative superhero, not an algorithm’s lunch.” 😀)