On a bright, sunny day off the coast of Florida, researchers witnessed an intriguing sight: a young dolphin using a sponge to shield its beak while foraging along the ocean floor. This clever behavior, initially taught by the dolphin's mother, had spread through social learning within the local dolphin community, giving rise to a unique foraging method. In a similar display of resourcefulness, a group of New Caledonian crows showcased their remarkable problem-solving abilities by fashioning hooks from twigs to extract insects from tree bark. Both instances exemplify the cultural ratchet effect, which involves the capacity to learn, innovate, and transmit knowledge within a group, consequently shaping the behavior of a species over time.
The cultural ratchet has propelled the rapid advancement of human civilization, as we consistently build upon the knowledge and discoveries of previous generations. Presently, we stand on the verge of a new era in cultural evolution, spurred by the symbiotic relationship between human intellect and artificial intelligence (AI). At the core of this phenomenon is an unexpected catalyst: data contamination.
GPT-4, a state-of-the-art AI language model, has demonstrated an extraordinary ability to tackle complex coding challenges. However, when tested with problems from the popular competitive programming platform Codeforces, it produced surprisingly mixed results. GPT-4 solved all ten of the easiest pre-2021 problems but none of the recent ones. Since the training data used for GPT-4 ended in 2021, this strongly suggests data contamination as the cause. Data contamination, in the context of challenge problems and benchmarks, refers to the unintentional incorporation of information from these tasks into the training data, which can artificially enhance the performance of machine learning models when evaluated on the same tasks. This type of data contamination can lead to misleading results, as the improved performance may not truly reflect the model's capabilities but rather its exposure to the problem during training. Consequently, this issue has become a growing concern among researchers who strive for accurate evaluation and understanding of their AI models' abilities.
Yet, data contamination might be less of a hindrance and more akin to the hooked twigs employed by New Caledonian crows. The seemingly unlikely connection between the cultural ratchet and data contamination becomes evident when we consider the role of human intellect in shaping AI systems. Leading minds in AI research devise cognitive tasks and benchmarks that drive the creation of more advanced models. As these tasks gain popularity, they are widely discussed and analyzed on the internet, leading to the dissemination of ideas and the evolution of concepts. For instance, a popular programming challenge may inspire numerous blog posts, forum discussions, and video explanations, each offering different perspectives and approaches.
Unintentionally, these conversations infiltrate the extensive repositories of text data used to train AI models like GPT-4. Consequently, the models learn not only from explicit knowledge but also from the diverse discussions, examples, and explanations available online. As a result, AI systems become more proficient at solving tasks, not just through algorithmic enhancements, but also via the incorporation of collective human wisdom. Importantly, leakage from the key cognitive challenges identified by the research community may provide the most insightful data for the next generation of AIs.
The convergence of human ingenuity and AI synthesis, facilitated by data contamination on the internet, signals the start of a new stage in our cultural evolution. This process generates a feedback loop, as AI systems inspire further human discourse and analysis, which in turn refines the AI models. This reciprocal interaction between human-generated ideas and AI adaptation accelerates the pace of progress, giving rise to an internet-scale brain.
The emergence of data contamination as a driving force behind a novel form of cultural ratchet has far-reaching implications for the future of artificial intelligence and human society. It transcends traditional boundaries, promoting knowledge exchange across various domains, and raises questions about the nature of intelligence and the evaluation of AI capabilities. As we embark on this journey, we must strive to understand and harness the immense potential of this new cultural ratchet, while remaining vigilant in our assessment of the AI systems that will shape our world.
If I understand your point, the LLM becomes the cultural repository for these innovations. A key question would be whether the model fitting/tuning process allows it to improve or generalize the human insights or whether it is just recording them for subsequent playback. If it's the latter, it isn't very interesting.
In human technology development, innovations are often communicated with improvements as each person gains experience and learns tweaks. I think this has been well-documented in the literature on "learning curves" in the semiconductor industry. Is there any evidence that the crows' skill is improving?