How T-MAC Outperforms NPU Performance on CPUs

With the rapid development of artificial intelligence, the inference efficiency of large language models (LLMs) has become one of the important indicators of technological progress. These models have extremely high requirements for computing resources, especially in inference tasks, and how to efficiently perform LLM inference on limited hardware has become the focus of attention in the industry. Traditionally, neural processing units (NPUs) have been often considered the best choice for handling LLM inference tasks due to their specialized design. However, the recent T-MAC technology, jointly developed by researchers from Microsoft Research Asia, the University of Science and Technology of China, and the University of Chinese Academy of Sciences, breaks this conventional wisdom and demonstrates the possibility of surpassing NPU performance on CPUs.

The core advantages of T-MAC technology

The breakthrough of T-MAC technology lies in its optimization of low-bit large language models. It uses a lookup table (LUT)-based approach that directly supports mixed-precision matrix multiplication (mpGEMM). Typically, LLM inference involves a complex weight dequantization process, which often becomes a performance bottleneck. However, by eliminating this step, T-MAC significantly reduces the computational complexity and directly improves the inference speed.

What's more, T-MAC technology is designed to be highly scalable and adaptable to different hardware platforms, especially those with resource-constrained edge devices. This feature provides strong support for the deployment of low-bit LLMs in real-world applications, ensuring the wide applicability of the technology in various scenarios. Traditionally, CPUs have been considered far inferior to NPUs in terms of inference performance, but the advent of T-MAC has brought new challenges to this notion.

Figure: How T-MAC technology outperforms NPU performance on CPUs

Comparison of experimental data and performance

To verify the benefits of T-MAC technology, the research team conducted detailed tests on Surface AI PCs equipped with the latest Qualcomm Snapdragon X Elite chipset. The experimental results show that the generation rate of the 3B BitNet-b1.58 model using T-MAC technology reaches 48 tokens per second, while the generation rate of the 2-bit 7B Llama model is 30 tokens per second, and the generation rate of the 4-bit 7B Llama model is 20 tokens per second.

These data show that T-MAC is not only able to achieve efficient inference performance on the CPU, but also to maintain good adaptability at different bit widths. This is important for use cases that require the deployment of low-bit LLMs on edge devices. Compared to the previous practice of relying on NPU, T-MAC offers a more flexible and cost-effective option.

Comparison of Performance between T-MAC and NPU

NPU is often considered the best choice for handling LLM inference, especially when a large number of parallel computing tasks are involved. However, T-MAC technology has demonstrated its potential to surpass NPU in CPU comparison tests in several comparative tests. Taking the Llama-2-7B-4bit model as an example, on the NPU, its generation rate is 10.4 tokens per second, while on the CPU, it can reach 12.6 tokens per second using only two-core T-MAC technology, and the peak performance can even reach 22 tokens per second.

This means that the CPU-powered T-MAC technology can provide better inference performance in specific tasks and scenarios, and is even expected to replace NPU as a more cost-effective solution. This performance improvement is not only a technological breakthrough, but also a rethinking of existing hardware architectures. It shows us that the boundary between CPU and NPU is not insurmountable, and that traditional computing platforms can also shine in the field of artificial intelligence with reasonable technical optimization.

Open-source contributions to T-MAC technology

The innovation of T-MAC technology is not only reflected in performance, but also in its open source practices. The research team has made the T-MAC codebase available on GitHub for developers and researchers around the world to use for free. This move has greatly accelerated the promotion and application of T-MAC technology, providing a valuable resource for more edge computing and smart device developers.

Through open source, T-MAC not only expands the application scope of the technology, but also promotes the further development of related technologies. Developers can optimize and extend T-MAC according to their own needs, so as to promote the continuous maturity of the technology. At the same time, T-MAC's open source practice also provides a collaborative platform for the entire technology community, promoting global technical exchanges and cooperation.

The far-reaching impact of T-MAC technology

The advent of T-MAC technology has had a profound impact on both academia and industry. In the field of academic research, T-MAC provides a new computational path, which opens up a new direction for low-bit computing and LLM inference research. Based on the T-MAC architecture, researchers can explore more efficient inference methods and further promote the development of artificial intelligence technology.

In industrial applications, the promotion of T-MAC technology helps to greatly reduce the cost of smart devices and improve the intelligence level of devices. This is of great practical significance to enterprises in the fields of Internet of Things, autonomous driving, and smart home. With the introduction and application of more low-bit models, T-MAC technology is expected to make greater contributions in improving inference efficiency, reducing energy consumption and cost.

Conclusions and future prospects

The successful development and open source of T-MAC technology marks a major breakthrough in the field of LLM device-side deployment. It not only demonstrates the great potential of CPUs in specific scenarios, but also paves the way for the widespread adoption of low-bit models. In the future, with the continuous improvement of T-MAC technology, we have reason to believe that it will play an increasingly important role in the field of edge computing and smart devices.

In the development process of global artificial intelligence technology, the influence of T-MAC technology will continue to expand. Its success is not only a supplement to existing technologies, but also a strong guide for the development direction of future smart devices and edge computing devices. With the addition of more enterprises and research institutions, the application scenarios of T-MAC will be more extensive, promoting the popularization and implementation of artificial intelligence technology.

How T-MAC Outperforms NPU Performance on CPUs

Time:August 20, 2024 Editor:Ana Hu Source:China Exportsemi

Related news recommendations

Login

Registration