Keysight AI Data Center Builder

AI product home page 1600x900 V2

Accelerate design and deployment of AI network infrastructure​

Already own this product? Visit Technical Support

Highlights

Keysight AI (KAI) Data Center Builder

  • Emulate high-scale AI workloads with measurable fidelity. Gain deep insights into collective communication performance.
  • Simplify the benchmarking process. Validate AI network fabric with pre-packaged benchmark applications, built through partnerships with key AI operators and AI infrastructure vendors.
  • Execute defined AI / ML behavioral models. Share between users and customers to help reproduce experiments.
  • Choose your test engine. Choose between AI workload emulation on Keysight hardware load appliances and software endpoints, or real AI accelerators to compare benchmarking results.
  • Apply automated test methodologies to qualify AI network fabric efficiency in job completion time, performance isolation, load balancing, and congestion control mechanisms.

Driving the Future of AI Networking: How Keysight AI Data Center Builder Empowers Juniper

  • KAI Data Center Builder helps Juniper validate next generation network fabric by emulating collective communications workload coming from a large scale of AI accelerators.​
  • Provides comprehensive test scenarios to demonstrate the efficiency and performance of lossless network fabric in load balancing and congestion mitigation.

Solving for AI Networking Challenges

Key industry trends and challenges in the AI / ML industry include:

  • AI clusters are expected to surpass 100K+ nodes by 2026​.
  • Idle up to 50 % of time waiting for data exchange​.
  • Innovation in AI networking requires new measurement and benchmarking tools​.
  • ​The KAI Data Center Test Builder​ is an Industry-leading 800 / 400GE test solution with a track record of lossless fabric validation​. It is faster to deploy with deeper insights compared to benchmarking with GPU-based systems​ and delivers provable fidelity of AI traffic emulation​.

Accelerate AI Network Design

Define the future of AI / ML infrastructure. Unlock possibilities and shape tomorrow’s landscape.

Accelerate AI Network Design

Benchmark job completion time of AI collective communications

Navigate the complexities of AI workloads.

​Achieve precision in network performance measurements​

Make design decisions based on deeper AI communications insights.​

Flexible what-if scenarios

Optimize AI collective performance by experimenting with AI traffic patterns to fine-tune fabric configuration.

​Cost-effective high-density AI network testbeds​

Scale experiments with AresONE-M 800GE and AresONE-S 400GE AI traffic emulation.​​

Transform AI Infrastructure Benchmarking

The KAI Data Center Builder helps transform AI infrastructure benchmarking with precision and speed, by:
  • Optimizing AI / ML system design with realistic emulation
    of high-scale AI workloads.​
  • Delivering insights into collective communications performance.​
  • Simplifying benchmarking and validation with pre-packaged methodologies delivered as applications.​​
  • Emulating Remote Direct Memory Access (RDMA) over Converged Ethernet v2 (RoCEv2) endpoints by using high-density AresONE traffic load appliances with hundreds of 400GE or 800GE ports.
KAI Data Center Builder

Keysight AI Collective Benchmarks

Pre-packaged methodology co-developed with key AI operators

Keysight AI Collective Benchmarks
The KAI Collective Benchmarks application is designed to run micro-benchmarking for typical AI communications algorithms on the user-provided AI network fabric.
  • Evaluate AI network fabric performance for common types of collective communications.​
  • Measure performance metrics, including job completion time, algorithm and bus bandwidth; calculate ideal % to quantify deviations from theoretical maximum performance.
  • Use AresONE hardware to measure and analyze Queue Pair (AI data flows) performance, to summarize results as percentiles with drill-down capabilities for further analysis.​
  • Assess RoCEv2 emulation fidelity by comparing AresONE hardware results with metrics collected on actual AI systems.

RoCEv2 Endpoints Emulation and Stateful Validation

Beyond emulation, pioneering precision in RoCEv2 validation

RoCEv2 Support in IxNetwork / AresONE-S​

IxNetwork / AresONE-S supports RoCEv2 transport protocol with Data Center Quantized Congestion Notification (DCQCN) congestion control and Priority Flow Control (PFC). It provides a scalable and cost-effective solution to validate data plane traffic management effectiveness in AI clusters, optimizing network fabric performance.

Speed and Scale

AresONE-S offers up to 16 x 400GE port capacity per device and can be combined into a multi-appliance configuration with 256+ ports in a single collective. Each port emulates an RoCEv2 endpoint and supports thousands of Queue Pairs with line rate traffic. This scale is crucial for reproducing network topologies of real AI clusters.

Traffic Flexibility

To match realism of AI workload patterns and reproduce issues at smaller setups, AresONE RoCEv2 capabilities cover a range of traffic patterns from in-cast, to partial mesh, to full all-to-all collectives in the first release. At the transport level, it supports sequences of RDMA verbs with configurable data sizes, burst rates, intervals, all combined with DCQCN and PFC rate control mechanisms.

Per Queue Pair DCQCN Flow Control

DCQCN per queue pair enables precise network congestion control with features like Explicit Congestion Notification (ECN) and rate control, optimizing data flow and network fabric responsiveness.

Visit the GitHub repository for AI / ML testing methodologies.

How to Test AI Data Center Networks

Efficient network design is crucial for faster data movement and reduced latency. The KAI Data Center Fabric Test Methodology aims to provide a consistent testing process with measurable metrics to optimize data center infrastructure for AI workloads. Follow this test methodology to benchmark job completion times, performance isolation, load balancing, and congestion control.

Benchmarking AI / ML clusters with realistic workloads requires costly investments in computing systems with GPUs and RDMA network interface controllers (NICs). Proper benchmarking involves configuring parameters such as cluster setup, congestion control, workload algorithms, job data size, traffic profile, and NIC performance.

AI Test Hardware

Keysight's data center load modules deliver high density and performance Ethernet IP test solutions with the industry's first 1G, 10G, 25G, 40G, 50G, 100G, 400G, and 800G speeds.

Protocol and Load Test

Already own one of these products? Visit Technical Support

Featured Resources

Want help or have questions?