According to Ilya Sutskever, the co-founder of OpenAI, given a sufficiently powerful language model that statistically predicts the next token, a prompt like the one below could potentially conjure an AGI (Artificial General Intelligence) agent that surpasses the limitations of human intelligence.
Does Sutskever's claim about AGI suggest surpassing the intelligence of the smartest individual ever to have existed, or does it encompass the collective intelligence of all people, both past and present? The term AGI is enigmatic, devoid of a universally agreed-upon definition. Yet, it's the differing perspectives that bring about a clear division. Those apprehensive of its potential for destruction, and those who view it as a catalyst for social equality.
In this article, my aim is to explore the nature of next token prediction. I intend to elucidate how, while it possesses immense power, it is simultaneously bounded by the inherent complexity of the very expectations placed upon it.
The next token prediction
Language models work by predicting the next word given a set of previous word as input. We call the input text the prompt and the generated text the inference. In its simplest terms, the model finishes off what you started. And it is able to do so by learning to predict the next word in all the books, articles, posts and discussions that are available online, generated by humans over centuries.
On the surface, it might seem counterintuitive to think that a model learning to imitate or copy human actions could lead to AGI. However, as Ilya points out in an interview, to understand this, we need to delve into what it means for a model to learn and predict the next token effectively.
Human models and learning
This concept finds a parallel in a more straightforward human learning task. Take, for example, how a child learns multiplication. Initially introduced to a multiplication table, the child starts by memorizing examples. Then child’s understanding is further enhanced with the rules of multiplication, which builds upon another concept they know: addition. Over time, the child learns to generalize the concept and can perform multiplications with combinations that were not included in the example set. The key to learning lies not in memorizing each instance, but in grasping the essence of multiplication.
We build upon the foundations of arithmetic and learn higher concepts in math in much the same way: through a set of principles, rules, and numerous examples. The true test of understanding is by answering questions that are not part of the original training examples. While most of us are exposed to a similar set of examples, each of us internalizes the rules and principles in different ways, often reaching a limit in our understanding as the concepts become more advanced.
Language models and learning
In contrast to human learning, large language models, such as those predicting the next token, operate in an unsupervised environment. They are not explicitly taught principles or rules but instead learn by processing an immense volume of text—far exceeding what a human would encounter over a lifetime. This difference in learning approach is pivotal: where humans rely on a structured curriculum and quality-controlled data, language models depend heavily on the sheer quantity of data. They engage in an autonomous journey, constructing and navigating the complexities of concepts in mathematics and other domains from the ground up, guided solely by foundational principles and unsupervised exploration.
Like humans, these models also face their own unique limitations. Their capacity is not bound by the extent of data exposure but is instead dictated by their architectural design and size. As they venture into advanced concepts, their constraints become evident, echoing the human experience of grappling with increasingly complex ideas.
What it really means to predict the next token
Given a complex chessboard position, the LM is asked to find the best next move. The model has to consider various possible moves and predict outcomes based on its understanding of chess strategy and tactics derived from its training data. In this chessboard scenario, the task of the LM goes beyond just calculating possible moves. It involves grasping a strategic depth similar to that of an experienced human player. The model must not only look at immediate gains or losses but also understand the broader strategy of the game. This involves knowledge of chess theories, recollection of historical games, and predicting the opponent's moves, all learned from extensive data on past games.
The LM's capability in chess mirrors its approach to solving more complex problems in various fields. Just as it evaluates a chessboard, it analyzes intricate patterns in any domain it is applied to. Predicting the next move in chess symbolizes the model's potential in providing insights into complex, real-world issues. For instance, when addressing any significant global challenge, the model would sift through a wide array of factors—technological, environmental, economic, and social—akin to navigating through numerous possibilities on a chessboard.
Beyond the realm of strategy and analytical intelligence, humor and sarcasm are other important aspects of human communication that language models are increasingly able to grasp. In this context, predicting the next token means the model must understand the subtleties and contextual meanings that are intrinsic to human culture. This involves recognizing idiomatic expressions, cultural references, and the varied implications of words.
Even in basic capabilities like summarization and classification, a profound understanding of language and its underlying meanings is essential. The model must navigate through context, tone, and intent to accurately condense or categorize text. As we aspire for language models to impersonate the most insightful minds tackling complex real world problems like sustainable energy – a challenge far more intricate than a game of chess – the model is prompted to not only emulate wisdom but also to comprehend and adapt to intellectual limitations. It's about synthesizing the collective wisdom of history's greatest thinkers, acknowledging the limits of their knowledge, and responding in a way that aligns with a deep, comprehensive grasp of their insights and perspectives. These tasks highlight the challenges language models face in truly capturing the depth and nuances of human intelligence and wisdom.
The fundamental limitation on prediction
Understanding the rules and principles behind a concept does not necessarily equip a model to immediately predict a future outcome. A classic example is the Three-body Problem, which involves the intricate dance of three celestial bodies under mutual gravitational influence. The complexity here is so profound that to accurately predict their future positions, one must undertake a laborious, step-by-step computation with no room for shortcuts. Stephen Wolfram's concept of computational irreducibility encapsulates this idea, proposing that certain complex systems necessitate exhaustive, sequential calculations for accurate prediction. Even the most advanced models are bound by this limitation, confined by the sheer volume of computation required.
Yet, when we turn our gaze from the celestial to the terrestrial, particularly to systems involving human agency and population dynamics, the predictability challenge intensifies. These systems are not like the orderly universe of celestial bodies, governed by well-defined physical laws. Human systems are marked by a level of unpredictability and nonlinearity far exceeding that of astronomical phenomena. The behavioral patterns and interactions within human societies don't adhere to rigid, deterministic rules. Consequently, increased computational power, which might suffice in domains like protein folding, where extensive computation can decode a protein's 3D structure from its amino acid sequence, falls short here. Human-centric systems react to a multitude of variables, often in unforeseen ways, leading to outcomes vastly different from minor initial changes.
This unpredictable nature of systems involving human behavior exemplifies the complexity that goes beyond mere computational challenges. It’s a realm where the butterfly effect reigns supreme, where a small alteration can cascade into vastly different outcomes. Therefore, accurately predicting such systems might demand computational resources on an impractical scale, perhaps as vast as the universe itself, highlighting the stark contrast between the predictability of rule-governed domains and the chaotic dance of human-influenced systems.
The limits of understanding the essence - Emergence
The concept of emergence, where the whole is more than just the sum of its parts, presents a significant challenge in predictive modeling. In the realm of language models and AI, this concept translates into the unpredictability inherent in human language and interaction. Just as the unique pattern of each snowflake cannot be deduced solely from the properties of water molecules, the nuances of human language and communication often elude the predictive capabilities of even the most advanced models.
Language, much like a snowflake's design, is shaped by a myriad of factors - cultural context, personal experiences, and even the unspoken subtleties of human emotion. These elements come together in complex and often unpredictable ways, giving rise to meanings and interpretations that are not always evident from the words themselves. For a language model, understanding the literal meaning of words is just the beginning. The real challenge lies in grasping the emergent properties of language - humor, sarcasm, metaphor, and emotion - elements that are deeply rooted in human experience and are not easily quantifiable.
Furthermore, the concept of emergence in human behavior extends to social dynamics and collective actions. The way individuals interact in a group, the emergence of social norms, and the evolution of cultural trends are all examples of emergent phenomena that are difficult to predict or model accurately. For AI and language models, navigating this terrain requires an understanding that goes beyond algorithms and computations.
In essence, the phenomenon of emergence underscores the inherent limitations of predictive models when faced with the intricacy and unpredictability of human behavior and language. It highlights the vast gap between computational processes and the fluid, dynamic nature of human interaction. As we continue to advance in the field of AI, the challenge remains not just in improving computational abilities, but in bridging this gap, in capturing the essence of what makes us uniquely human.
Overcoming the computational irreducibility and emergence
I think humans have a special ability to navigate the inherent challenges of nature. Intelligence emerges from our collective experiences, developing through continuous interaction between individual insights and collective wisdom. This makes it impossible for intelligence to exist in isolation.
Every individual plays a role in shaping this collective intelligence. By engaging with their surroundings, people act like unique processing units, each offering different perspectives. The ideas and solutions they generate contribute to the collective wisdom of humanity, leading to new emergent properties. A prime example of this is in the realm of scientific discovery. Each scientific breakthrough builds upon the foundations laid by previous scientists. Every new discovery or hypothesis adds another layer to our collective understanding, much in the way that a complex, evolving network grows with each individual's input.
This dynamic resembles a vast, evolving network, where each contribution is a step towards understanding the complex and emergent aspects of the natural world. Just as the greatest scientific achievements have been the result of cumulative efforts over generations, the collective intelligence of humanity continues to grow, shaped by the diverse perspectives and knowledge of each individual.
My thoughts on AGI
As we ponder the evolving landscape of AI, it's crucial to revisit the inherent limitations we've uncovered - the hurdles of computational irreducibility and the enigma of emergence. These concepts remind us of the profound challenges that even the most powerful predictive models face. Computational irreducibility teaches us that some problems require a methodical, step-by-step analysis, defying any attempts at computational shortcuts. Emergence, on the other hand, reveals the unpredictability in systems, especially those influenced by human behavior, where the whole becomes inexplicably more than just the sum of its parts.
These limitations draw a clear boundary in the sand, underscoring that no matter the leaps in predictive prowess, there are puzzles in nature that remain stubbornly resilient. They are bounded not just by the limits of our current technology but by the very fabric of natural laws and complex systems.
Interestingly, where AI encounters these boundaries, humans often find a way to transcend them through collective intelligence. Our history is a testament to this unique human ability to pool knowledge and wisdom, navigating through the complexities of nature and society. It's this collective endeavor that has allowed us to overcome challenges that seemed insurmountable to the isolated intellect.
Yet, as we gaze into the future of AI, the question looms: can AI harness a similar power? Can it mirror this collective intelligence that has been the hallmark of human progress? Currently, the real strength of AI lies more in its capacity for automation than for true intelligence. In the short term, this automation brings tangible benefits in efficiency and convenience to individuals. But as we look ahead, the long-term view is shrouded in uncertainty. Will AI reshape the individual in ways that ripple through our collective societal fabric? Could these automated entities form their own collective, with emergent properties that might even clash with human interests?
These questions are not just speculative; they reflect the principle of computational irreducibility in action. Each step into this new era of AI is a step into the unknown, a journey through a narrative that intertwines immense potential with cautionary tales. As AI evolves, perhaps even striving to one day match or surpass human intellect, it invites us to engage in a profound reflection. What does it mean to be truly intelligent? How does our human intelligence, shaped by collective wisdom and experience, compare to the artificial minds we are creating?
In this exploration of AGI and language models, we find more than just technological marvels. We find a mirror reflecting our quest to understand the very essence of intelligence and wisdom, both human and artificial. It's a journey that is as much about the capabilities of our creations as it is about understanding ourselves and our place in the universe.