Recently, a research team from the Korea Institute of Basic Science (IBS), Yonsei University, and the Max Planck Institute in Germany unveiled a breakthrough artificial intelligence technology called Lp-Convolution. This innovative approach, inspired by the workings of the brain's visual cortex, achieves for the first time a triple improvement in accuracy, efficiency, and biorealism in machine vision, opening up a new way for computers to "see" the world.
1. The bottleneck of traditional AI vision: the dilemma from CNN to Transformer
In the field of computer vision, convolutional neural networks (CNNs), as the mainstream model, have long relied on fixed-size square filters (such as 3×3, 5×5 convolutional kernels) to extract image features. While this "mechanical scanning" mode can identify local details, it struggles to capture overall associations in fragmented data, such as pinpointing key targets in complex scenes.
Although the visual Transformer (ViTs) that has emerged in recent years can analyze complete images through the global attention mechanism, its dependence on large-scale computing power and datasets makes it difficult to implement in scenarios with high real-time requirements such as autonomous driving and medical imaging. How to balance "local detail capture" and "global semantic understanding" has become a core problem in the academic community.
2. Lp-Convolution: A dynamic feature extractor that simulates the brain
The research team was inspired by the workings of the brain's visual cortex: the human visual system selectively focuses on key information (such as the receptive field structure of retinal ganglion cells) through circular sparse connections. Inspired by this, Lp-Convolution introduces multivariate p generalized normal distribution (MPND), which enables the AI model to dynamically adjust the shape of the convolution kernel based on the input image – "stretching" horizontally to capture broad features (such as road outlines) when needed, or "compressing" vertically to focus on details (such as license plate characters).
This design breaks through the limitation of "fixed kernel size" of traditional CNNs, and effectively solves the problem of large-kernel convolution that has plagued the academic community for many years: in the past, simply increasing the size of the convolutional kernel (such as 7×7) will lead to parameter explosion but cannot improve the performance, but Lp-Convolution enhances the feature expression ability while maintaining computational efficiency through a biologically inspired dynamic connection mode.
Figure: Brain Science Inspired AI Breakthrough: Enabling Computers to Human-like Visual Perception
3. Performance verification: a visual revolution with higher precision and stronger robustness
In the test of standard image classification datasets such as CIFAR-100 and TinyImageNet, Lp-Convolution improved the accuracy of the classic AlexNet model by 8.2% and the modern RepLKNet model by 5.6%. What's more, when the input data is mixed with noise or local occlusion, the anti-interference ability of this method is 15-20% higher than that of traditional CNNs, which significantly improves the reliability of AI in real-world scenarios.
Neuroscience-level validation reveals a breakthrough: when Lp-Convolution's mask pattern approaches the Gaussian distribution, the AI model's internal neuronal activation patterns are highly correlated with the neural activity in the mouse visual cortex. This shows that this technology not only improves the performance of the algorithm, but also approximates the working principle of the human brain at the level of neural mechanisms.
4. Application scenarios: the leap from the laboratory to the real world
Lp-Convolution's lightweighting and high efficiency make it show its transformative potential in several key areas:
Autonomous driving: Dynamically adjust the focus of perception to quickly identify sudden obstacles (such as pedestrians and scattered objects on the road) to shorten the delay in decision-making.
Medical imaging: Accurately capture subtle lesions (such as early lung cancer nodules) in X-rays and CT scans to reduce the missed diagnosis rate;
Service robots: flexibly identify target objects in complex home environments (such as distinguishing similar tableware) to improve operational adaptability.
"Humans can instantly lock on to critical information in crowded scenarios, and Lp-Convolution simulates this ability." C. Justin Lee, Ph.D., director of the Center for Cognition and Sociality at IBS, who is the research leader, noted, "We're giving AI the brains to 'think' about visual problems, which breathes new life into convolutional neural networks.”
5. Future outlook: a key step towards artificial general intelligence
At present, the research team has open-sourced the code and model, and plans to present the full results at the ICLR 2025 international conference. In the next step, they will explore the application of the technology to complex inference tasks, such as real-time image semantic segmentation and logic puzzle solving, to promote AI from "perceptual intelligence" to "cognitive intelligence".
This breakthrough is not only a milestone in the field of computer vision, but also reveals an important trend: learning from the brain is becoming the key to solving the core puzzle of AI. With the deep intersection of neuroscience and machine learning, future intelligent systems may truly have human-like flexible perception and decision-making capabilities, opening a new era of general artificial intelligence.