Recently, it was reported that SEMIFIVE has signed a large-scale production contract with HyperAccel for the production of artificial intelligence chips designed for transformer-based large language models (LLMs). The chip, known as the LLM Processing Unit (LPU), is the world's first semiconductor LPU tailored for LLM inference, and is expected to replace existing high-cost, low-efficiency graphics processing units (GPUs) with low cost, low latency, and domain-specific features. Compared to a typical supercomputer, this chip has 2x better performance and 19x better price/performance.
The LPU developed by HyperAccel is designed for large language models (LLMs), while SEMIFIVE focuses on SoC platforms and ASIC design solutions, and is currently working on developing expert SoC design platforms to address customer demand for AI custom silicon chips. SEMIFIVE is actively developing its SoC chip platform with industry-leading partners.
In addition, there is an AI fabless startup called Rebellions, which is preparing to mass-produce artificial intelligence (AI) chips dedicated to data centers using Samsung Electronics' 5nm-scale extreme ultraviolet (EUV) process. Rebellions has completed a large-scale production contract with SEMIFIVE, one of Samsung Electronics' Design Solution Partners (DSPs) for Samsung's 5nm AI semiconductor, ATOM. Mass production is expected to begin early next year. ATOM is known for its industry-leading GPU performance and 3.4x more energy efficiency than equivalent neural network processors (NPUs).
Figure:SEMIFIVE partner with HyperAccel to mass production AI chip(Source:SEMIFIVE)
Compared with traditional GPUs, LPUs have the following advantages in terms of performance and energy efficiency:
1. Specially optimized architecture: The LPU is optimized for the computational density and memory bandwidth of the LLM, reducing the time required for each word to be calculated, so that text sequences can be generated faster. This specialized architecture allows LPUs to deliver higher efficiency and lower latency than traditional GPUs when handling language-based tasks.
2. High inference performance: The LPU performs well in inference tasks and is able to perform inference of large language models at extremely high speeds. For example, Groq's LPUs are capable of outputting 500 tokens per second, which is 10 times faster than traditional GPUs.
3. Balance between memory bandwidth and computing logic: The LPU ensures an effective balance between processing power and data availability by optimizing the memory access mode and computing resource management, significantly improving the performance of NLP tasks.
4. Low latency and high scalability: The LPU is designed with a dedicated Synchronization Link (ESL) to hide the data synchronization delay between multiple LPUs, achieving near-perfect scalability. This means that as the number of LPUs increases, the performance gains are more efficient and the latency is lower.
5. Energy efficiency: LPU has also shown its advantages in terms of energy efficiency. By reducing the overhead of managing multiple threads and avoiding inefficient use of cores, the LPU is able to accomplish more computing tasks with less energy consumption. With an area of 0.824 mm² and a power consumption of 284.31 mW on the 4nm process, HyperAccel's LPUs outperform NVIDIA H100 and L4 GPU server solutions in terms of energy efficiency.
6. Cost-effectiveness: While it is mentioned in some analyses that the actual deployment cost of the LPU may be higher than expected because more LPUs are needed to run the same size model, the LPU offers a better price-performance ratio in terms of performance-to-cost ratio.
7. Software framework support: The LPU is equipped with a specialized software framework, such as HyperDex, which provides a runtime environment based on the widely used HuggingFace API, so that various LLM applications can be executed seamlessly on the LPU hardware.
In summary, LPUs offer higher performance, lower latency, and better energy efficiency than traditional GPUs when processing large language model tasks, which makes it an attractive option in the AI chip market. With the continuous development of technology and the growth of market demand, LPUs are expected to play an increasingly important role in the field of AI computing in the future.