According to the State of AI, large language models still face challenges in planning and simulation tasks, and in new types of tasks, large language models cannot rely on memory and retrieval, so performance often degrades. This suggests that it is difficult for large language models to generalize what they learn beyond familiar patterns without external help. Even advanced large language models like GPT-4 are difficult to reliably simulate state transitions in text-based games, especially for environment-driven changes. These large language models are not able to consistently understand causality, physical laws, and the permanence of objects, and therefore are not well suited for a relatively simple task.
The challenges faced by large language models in planning and simulating tasks have led the AI industry to refocus on artificial general intelligence (AGI). Refers to an AI system with broad intelligence that is able to learn and adapt in many different domains and tasks like humans. Unlike the current popularity of specialized AI, such as deep learning models, which are often only good at specific tasks, the goal of AGI is to create agents that can understand, learn, and apply knowledge to a wide range of domains. Achieving AGI is an extremely complex and challenging task that involves knowledge from multiple disciplines such as computer science, cognitive science, neuroscience, psychology, and so on. At present, AGI is still a long-term goal in a field of research, and although significant progress has been made in some specific areas (e.g., image recognition, natural language processing, etc.), there is still a long way to go to achieve true AGI.
Figure: Large language models still face challenges in planning and simulation tasks
The development of AGI has also sparked a wide range of ethical and social discussions, including the possibility of machine intelligence surpassing human intelligence, employment implications, the distribution of decision-making power, privacy protection, and other issues. Therefore, the study of AGI is not just about technical issues, but also involves deep social and philosophical questions.
According to the State of AI, historically, large language models have performed poorly in this benchmark, with a maximum performance of around 34%. The current maximum score is 46 points (the target is 85 points). This achievement was achieved by the Minds AI team, which adopted a large language model-based approach to improve performance through active inference, fine-tuning LLMs on test task examples, and extending synthetic examples. In order to promote the development of general artificial intelligence, some social institutions have also introduced bonus incentives. For example, François Chollet, founder of Keras, and Mike Knoop, co-founder of Zapier, have partnered to launch the ARC Bonus, which has set up a $1 million prize pool to reward teams that have made significant progress on the ARC-AGI benchmark. The ARC-AGI benchmark was created in 2019 to measure the generalization ability of a model. This test focuses on tasks that are easy for humans but difficult for AI.
Figure: The challenges faced by large language models in planning and simulating tasks have led the AI industry to refocus on artificial general intelligence
The generalization ability of AI refers to the ability of a model to make predictions on unseen data. A model with good generalization ability can perform well on data outside of the training set, not just fit well on the training set. Models with strong generalization ability are able to grasp the universal laws behind the data, rather than just memorizing the features and labels in the training data. This means that even in the face of new, unknown data, the model can make accurate predictions or decisions. Generalization capability is a key indicator to measure the performance of AI models in real-world applications. By understanding and improving the generalization capabilities of models, we can develop more robust and reliable AI systems to deal with volatile and complex real-world problems. In advanced AI research and applications, such as GPT models, generalization capabilities are particularly important. GPT models are pre-trained on large amounts of text data to learn a wealth of linguistic knowledge, and then fine-tuned on specific tasks. The GPT model excels in a variety of natural language processing tasks precisely because of its excellent generalization ability, which can apply what was learned in the pre-training stage to seemingly unrelated tasks. The design of ARC-AGI emphasizes the importance of generalization capabilities, as it requires AI systems to be able to adapt to new environments that have not been seen before, and to be able to solve problems beyond their training data.
Related: