Home > All news > Industry news > New IBM Processor Innovations to Accelerate AI on Next Generation IBM Z Mainframe Systems
芯达茂F广告位 芯达茂F广告位

New IBM Processor Innovations to Accelerate AI on Next Generation IBM Z Mainframe Systems

At Hot Chips 2024, IBM unveiled architectural details for its upcoming IBM Telum II processors and IBM Spyre accelerators. This new technology is designed to dramatically increase the processing power of the next generation of IBM Z mainframe systems, especially in AI integration, and will accelerate the synergistic application of traditional AI models with large language models (LLMs).

As more generative AI projects move from proof-of-concept to production, the need for energy-efficient, secure, and scalable solutions is quickly becoming a key priority. Morgan Stanley's research report released in August predicted that the electricity demand for generative AI will grow by 75% annually in the next few years, and that AI energy consumption is expected to reach the level of electricity consumed in Spain in 2022 by 2026. Many IBM clients have made it clear that a hybrid design approach that supports large-scale foundation models and AI workloads is becoming increasingly important in architectural decisions.

Key innovations in this release include:

IBM Telum II processor: Designed for the next generation of IBM Z systems, the new processor offers increased frequency and memory capacity over the first generation of Telum chips, a 40 percent increase in cache capacity, and an integrated AI accelerator core and a connected data processing unit (DPU). The processor will power LLM applications in enterprise computing to address complex industry transaction needs.

IO Acceleration Unit: The new Data Processing Unit (DPU) on the Telum II processor is designed to accelerate complex IO protocols for mainframe networking and storage. DPUs simplify system operations and improve the performance of critical components.

IBM Spyre Accelerator: This accelerator provides additional AI computing power to complement the capabilities of the Telum II processor. Telum II works in tandem with Spyre chips to form a scalable architecture that supports an AI integration approach that combines multiple machine learning or deep learning AI models with encoder LLMs. By leveraging the benefits of different model architectures, integrated AI is able to deliver more accurate and powerful results than a single model. The IBM Spyre Accelerator chip, previewed at the Hot Chips 2024 conference, will be available as an add-on option. Each accelerator chip is connected via a 75-watt PCIe adapter and is based on technology developed in collaboration with IBM Research. Like other PCIe cards, Spyre Accelerator is scalable to meet customer needs.

"Our robust multi-generational roadmap allows us to stay ahead of technology trends, especially as the demand for AI continues to grow," said Tina Tarquinio, vice president of product management for IBM Z and LinuxONE. "Telum II processors and Spyre accelerators are designed to deliver high-performance, secure, and more energy-efficient enterprise computing solutions. After years of research and development, these innovations will be introduced to our next-generation IBM Z platform, enabling customers to leverage LLMs and generative AI at scale.”

Figure: IBM unveils new processors

The Telum II processors and IBM Spyre Accelerator will be manufactured by Samsung, IBM's long-time manufacturing partner, and will be built on its high-performance, energy-efficient 5nm process node. Working together, the two will support a range of advanced AI-driven use cases designed to unlock business value and create new competitive advantages. With AI's integrated approach, customers are able to get predictions faster and more accurately. The combined processing power announced this time will provide tangible support for the application of generative AI use cases. Some examples include:

Insurance claims fraud detection: By combining LLMs with traditional neural networks, integrating AI enhances fraud detection in home insurance claims, improving performance and accuracy.

Advanced Anti-Money Laundering: Improve the ability to detect suspicious financial activity, support compliance with regulatory requirements and reduce the risk of financial crime.

AI Assistants: Accelerate the application lifecycle, transfer of knowledge and expertise, and code interpretation and transformation.

Specifications and Performance Indicators:

Telum II processor: Equipped with eight high-performance cores running at 5.5GHz, each with 36MB of L2 cache, and 40% more on-chip cache capacity for a total of 360MB. Each processor drawer has a virtual L4 cache capacity of 2.88GB, a 40% increase over the previous generation. The integrated AI accelerator enables low-latency, high-throughput in-transaction AI inference, such as enhanced fraud detection in financial transactions, enabling a four-fold increase in computing power per chip compared to the previous generation.

The new I/O acceleration unit DPU is integrated within the Telum II chip. It is designed to increase I/O density by up to 50% for improved data processing. This advancement improves the overall efficiency and scalability of IBM Z, making it ideal for handling the large-scale AI workloads and data-intensive applications of today's enterprises.

Industry insiders say it's an accelerator designed for enterprises, with scalable capabilities for complex AI models and generative AI use cases. It's currently in the demo phase, with up to 1TB of memory, and can work together within a regular IO drawer with eight cards to support AI model workloads across the host. Each card consumes no more than 75W. Each chip is equipped with 32 compute cores and supports int4, int8, fp8, and fp16 data types for low-latency and high-throughput AI applications.

Related news recommendations

Login

Register

Login
{{codeText}}
Login
{{codeText}}
Submit
Close
Subscribe
ITEM
Comparison Clear all