Technology Feast | Internet Data Center Network 25G Network Architecture Design

Technology Feast | Internet Data Center Network 25G Network Architecture Design

Time: October 17th, 2024

Background

1. "Why should we upgrade to 25G Ethernet?"

In the past years, we have observed that many Internet data centres have upgraded their server access from 10G Ethernet to 25G Ethernet. Why do people want to upgrade to 25G Ethernet?

Following information furnish a concise answer to the above mentioned question:

● Supporting high-performance businesses, involves collaborating with rapidly expanding businesses to enhance application system performance. For instance, internet applications based on AI and big data have led to a significant increase in business traffic.

● Supporting business emergencies is crucial, especially when there are sudden business crises that require full infrastructure support from the business side.

● Matching the upgrade of server performance， entails enhancing the server CPU and storage I/O performance, which in turn boosts the network throughput performance of each server. It has been observed that 10G networking is insufficient to meet the bandwidth requirements.

● Reducing the cost per bit, is important for public cloud services. The adoption of 25G Ethernet has led to a decrease in network single-bit cost, consequently reducing operating expenses.

● Realizing technical dividends is important. The new generation of 25G Ethernet switch chips offers a wide range of technical features, such as Telemetry and RDMA (Remote Direct Memory Access), significantly enhancing the efficiency of basic network operations and maintenance while cutting down on costs.

In Internet data centers, what are the differences in the networking architecture between 25G Ethernet and 10G Ethernet? Let's now explore the networking architecture of 25G.

2. What factors determine the 25G networking architecture?

When designing and implementing a 25G Data Centre network, it's important to consider two main factors that influence the choice of products and architecture solution:

1. Server scale: This refers to the expected number of servers in a single cluster.

2. Business application requirements: This includes the network convergence ratio, single/dual uplink of servers, and other requirements specific to different types of business applications.

The two most common network architecture models are the two-level network architecture and the three-level network architecture. In the following analysis, we will examine how these architectures correspond to the server scale and applicable business application requirements.

25G Network Architecture Design Solution

1. Two-level network architecture

▲Figure 1: two-level network architecture topology diagram

In the above Figure 1, we analyze the 2 two-level network architecture topologies in terms of single/dual uplink mode, scale, equipment selection, and convergence ratio of the server as follows:

	1,000~2,000 sets	5,000~20,000 sets	5,000~20,000 sets
Architecture type	two-level multi-core BOX architecture	two-level multi-core CHASSIS architecture	two-level multi-core CHASSIS architecture
Server- single uplink	2000 units	10000 ~ 20000 units	10000 ~ 20000 units
Server- dual uplink	1000 units	5000 ~ 10000 units	5000 ~ 10000 units
Equipment model	Leaf: RG-S6510-48VS8CQ (4825G+8100G) Spine: RG-S6520-64CQ (64*100G)	Leaf: RG-S6510-48VS8CQ (4825G+8100G) Spine: RG-N18000-X series (CB-card) 8/16 service slots	Leaf: RG-S6510-48VS8CQ (4825G+8100G) Spine: RG-N18000-X series (DB-card)8/16 service slots
Convergence ratio	Spine : 3:1 Leaf : 1.5:1	Spine : 3:1 Leaf : 1.5:1	Spine : 3:1 Leaf : 1.5:1

▲ Table 1: Comparison of two-level network architectures

When the scale of a single cluster server ranges from 1,000 to 2,000, a box-type (BOX) multi-core two-level architecture can be utilized to fulfill the demand. This architecture employs the same series of single-chip switch solutions for PFC (Priority-based Flow Control), + ECN (Explicit Congestion Notification), + MMU (Memory Management Unit) management. The chip waterline settings are highly consistent and well-coordinated. Additionally, the forwarding delay is low and the throughput is high. The entire network can implement RDMA services and network visualization solutions.

For a scale of 5,000 to 20,000 single cluster servers, a Chassis-based multi-core two-level architecture can be employed. The Spine layer core devices of this architecture offer two types of core boards to choose from:

1.CB-type boards cater to business scenarios with frequent many-to-one services and effectively reduce packet loss in such scenarios through a large cache mechanism.

2. DB-type boards are suitable for business scenarios with high requirements for RDMA networking and network visualization. This architecture also inherits the advantages of the BOX multi-core two-level architecture.

In the two-level networking architecture, the choice of architecture depends on the scale of single cluster servers and business needs. In the routing protocol part of the networking, EBGP (External Border Gateway Protocol) can be used between Spine and Leaf. All Leaf devices are deployed with the same AS number (Autonomous System number). The Spine layer replaces the AS number after receiving the Leaf layer route to resolve the EBGP horizontal split problem. When the business requires dual uplink of servers, it is recommended to use the de-stacking solution for Leaf layer deployment. "For details, see [Article 1] How to "de-stack" data centre network architecture.

2. Three-level network architecture

▲Figure 2: three-level architecture topology diagram

For ultra-large data centres with more than 20,000 servers in a single cluster, the Spine Leaf two-level network can no longer meet the needs, and the scalability is also poor. In this case, it is recommended to adopt a three-level architecture based on POD (Point Of Delivery, the smallest unit of the data centre) and horizontal expansion (Scale-out).

As shown above in Figure 2, each POD has a two-level network of Spine Leaf. The number of servers and network devices is standardized and fixed. Multiple PODs are interconnected through core devices to achieve larger-scale networking and solve the problem of flexible expansion. We present the following Table 2 based on the number of PODs, server scale, device selection and convergence ratio:

	More than 20,000 sets	More than 20,000 sets
Architecture type	A three-level architecture based on POD horizontal expansion	A three-level architecture based on POD horizontal expansion
Number of PODs	14 ~ 56	14 ~ 56
Servers in POD- Single uplink	2000 units	2000 units
Servers in POD- dual uplink	1000 units	1000 units
Whole network server - single uplink	30000 ~ 120000 units	30000 ~ 120000 units
Whole network server - dual uplink	15000 ~ 60000 units	15000 ~ 60000 units
Equipment model	Leaf: RG-S6510-48VS8CQ (4825G+8100G) PoD-Spine: RG-S6520-64CQ (64*100G) Core: RG-N18000-X series(CB-card)16 service slots	Leaf: RG-S6510-48VS8CQ (4825G+8100G) PoD-Spine: RG-S6520-64CQ (64*100G) Core: RG-N18000-X series (DB-card) 16 service slots
Convergence ratio	Core: 3:1 POD-Spine :3:1 Leaf : 1.5:1	Core: 3:1 POD-Spine :3:1 Leaf : 1.5:1

▲Table 2: Comparison table of three-level architecture

In the three-level networking architecture, there are two types of equipment selections. The POD is a standard Spine Leaf two-level architecture with the same equipment selection. At the core layer, it's important to differentiate and select between the Chassis (CB-type board) multi-core two-level architecture and Chassis (DB-type board) multi-core two-level architecture in the two-level architecture.

When deploying RDMA services for a business, it is recommended to control the deployment scope of the RDMA domain within the POD. This is because the control difficulty of PFC and ECN messages will significantly increase for larger-scale RDMA deployment, leading to a more serious impact of congestion and back pressure.

If planning a larger-scale data centre with over 100,000 servers in a single cluster, it's necessary to upgrade the Spine layer switch equipment and use a BOX device that can provide 128 ports of 100G in order to double the server scale in the POD.

"Outlook for the Next-Generation Data Center Architecture"

According to the International Data Corporation (IDC), the data processed by data centres is projected to reach 175ZB in 2025, which is five times the amount in 2018. China is expected to experience the fastest growth, with data increasing from 7.6ZB in 2018 to 48.6ZB in 2025. With the rapid development of data, the foundational network infrastructure needs to undergo various improvements, including iterative upgrades of network bandwidth and enhancements to the IP CLOS architecture with a 1:1 network convergence ratio. Will the next-generation networking architecture continue to use chassis devices in the IP CLOS network architecture? How will server access evolve and upgrade to meet business needs? The next article will provide detailed explanations.

Summary

The transition from 10G to 25G Ethernet in Internet Data Centers (IDCs) represents a significant advancement in network architecture, driven by the growing demands of modern applications. Upgrading to 25G Ethernet supports high-performance business needs, including improved application efficiency for AI and big data, while also accommodating sudden traffic spikes. The design of 25G networks is influenced by server scale and business application requirements, leading to two primary architectures: two-level and three-level networks. The two-level architecture is suitable for clusters of 1,000 to 20,000 servers, utilizing box-type and chassis-based designs for optimal performance. In contrast, three-level architectures are essential for ultra-large data centres exceeding 20,000 servers, enabling scalable, flexible network solutions through standardized PODs. As data demands continue to rise, advancements in 25G networking will play a crucial role in shaping the future of data centre architecture.

Digital Innovation in Education: Ruijie's Optical Ethernet 3.0 Solution

Technical Feast | A Brief Analysis of MMU Waterline Settings in RDMA Network

Higher Education

5-Star Hotel Solution

Technology Feast | Internet Data Center Network 25G Network Architecture Design

Background

25G Network Architecture Design Solution

Summary

Featured blogs