CXL 3.0 and the Future of AI Data Centers
Introduction
Artificial intelligence (AI) is changing every aspect of modern technology, including how we design and build data centers. Data centers are the infrastructure that supports AI itself. Traditional data center servers cannot keep up with the data processing requirements and power consumption limits of AI and machine learning (ML).
Compute Express Link (CXL), a data protocol created by the CXL Consortium, provides an open-standard cache-coherent link between processors, memory buffers, and accelerators in a data center server, disaggregating the components for more efficient processing. CXL enables data center fabrics comprised of interoperable elements to share resources and tackle tough computational problems. As disaggregated, resource-sharing server architecture develops and the new protocol becomes more mainstream, CXL will enable the infrastructure for processing the petabytes of data necessary for AI, ML, edge computing, and other data-intensive technologies.
Learn about Keysight and the CXL Consortium
What is CXL?
CXL uses the same physical electrical layer as Peripheral Component Interconnect Express (PCIe®), but it features its own unique link and transfer layer protocols. CXL links facilitate resource pooling between a CPU and specialized endpoint hardware (such as hardware accelerators or memory buffers) for process-specific workloads.
CXL has three main protocols. CXL.io, required for all CXL devices, handles discovery, configuration and interrupts like the PCIe transaction layer. CXL.cache enables CXL accelerators to access CPU memory to ensure the onboard cache is coherent, which is necessary for two devices to share computational resources, as shown in Figure 1. CXL.memory allows for memory expansion devices (buffers), increasing available persistent memory, which operates at near-DRAM speeds with NAND-like non-volatility, shown in Figure 2. CXL devices come in three flavors:
- Type 1 devices are hardware accelerators featuring CXL.cache only.
- Type 2 devices are accelerators with memory onboard featuring CXL.memory and CXL.cache.
- Type 3 devices are memory expansions featuring CXL.memory only.
Figure 1. CXL.cache allows cache-sharing between a host and an acceleration device
Figure 2. CXL.memory allows a host to access memory on an attached memory buffer device
What are the benefits of CXL?
CXL's main goal is to enable data center capacity expansion to handle increasing workload demands from emerging technologies. Its unique innovations make disaggregating complex computing tasks more feasible and efficient by sharing memory and processing resources, all while keeping them coherent with low latency.
CXL benefits from existing physical layer infrastructure, building on decades of PCI-SIG® innovation and industry familiarity, but reduces latency by streamlining communication between devices. Each PCIe transaction requires overhead communication between the host and endpoint to communicate the payload length. CXL eliminates this extraneous overhead by using a fixed 528-bit flow control unit (flit).
Browse Keysight PCIe and CXL solutions
What is new in CXL 3.0?
Since its introduction in 2019, the development of CXL has shown steady progress toward its goal of enabling full computational fabric and disaggregated computing. CXL 1.1 supported only one device/host relationship at a time. CXL 2.0 introduced the ability to support up to 16 hosts simultaneously accessing different portions of memory and switching. CXL 3.0 adds peer-to-peer memory access and multi-tiered switching, which increases the scope and support for disaggregated computing.
CXL 3.0 also allows CXL to match PCIe 6.0 speeds (64 GT/s) over PCIe 6.0 hardware and supports backward compatibility with previous CXL protocols and PCIe hardware. Most importantly, CXL 3.0 introduces fabric capabilities, freeing the standard from the traditional tree topology. A select list of CXL 3.0 features can be found in Figure 3.
Figure 3. CXL features across generations (Source: CXL Consortium)
What does CXL mean for AI data centers?
CXL’s development into a highly flexible link network has enabled composable, scalable computational fabrics. Fabrics are interconnected nodes in a system that can engage and interact with others to get a job done quicker and more efficiently rather than remain constricted by traditional tree-based architectures.
Data centers have trended toward disaggregating processing away from single-server systems to networks of link switches, allowing pooling of resources. Now that AI and machine learning are placing an unprecedented load on data centers, everyone from chip designers to systems integrators has had to rethink how data gets transmitted, communicated, and processed.
The most important element that CXL brings to the data center is resource pooling. Allowing CPUs to access other specialized resources to complete complex computations is key to an efficient, decentralized design philosophy. CXL 3.0 includes new features such as multi-level switching, multi-headed and fabric-attached devices, enhanced fabric management, and composable disaggregated infrastructure (Figure 4), which enables the standard to become the link or the thread that weaves the fabric of the data center together.
Figure 4. Example of a CXL 3.0-enabled fabric data center architecture (Source: CXL Consortium)
What are the potential challenges of designing or validating CXL products?
Modularity only works when every device complies with interoperability requirements. Validation and compliance tests become essential to ensure that each vendor’s product plays nicely with every other device. Compliance testing brings challenges. Although CXL builds upon PCIe interconnects and electrical building blocks, even seasoned PCIe developers need to take care when designing and validating their CXL devices.
One such challenge is maintaining coherency among disparate caches, which has overhead costs from snoop operations and data copying. The CXL specification recommends a bias-based coherency model to alleviate the need for excessive snoop operations. However, the system may mask improper behavior around biasing. While memory access would be possible and coherency maintained, there could be unnecessary overhead if the system does not follow biasing rules properly. Analyzing and detecting improper behavior around biasing can yield important insights to improve system performance and reduce latency. Because of these and other potential issues with CXL devices, specialized test software may be useful for developers learning about debugging and validating CXL device performance.
Conclusion
CXL is a significant step toward disaggregation and modular design. It enables multiple devices to work together on complex computations, freely sharing resources to tackle petabytes of data generated by AI and other data-intensive industries. It may take a few more years and more generations of the CXL standard to see its full effect on the data center industry, but it is safe to say CXL will play a significant role in enabling artificial intelligence and machine learning applications.
Contact a Keysight expert about CXL solutions
PCI-SIG® and PCIe® are US registered trademarks and/or service marks of PCI-SIG.