400G Optics & Switch Solutions: Ready for Prime Time and Ramping Fast

Data center customers are experiencing exponential demand for higher bandwidth and scale driven by a massive growth in data, AI and 5G. These customers are deploying 12.8Tbps switches in production networks, together with 400G optics. The 400G deployments deliver the following advantages

  • A much higher bandwidth density. Instead of a 1RU, 32x100G switch, customers can use a 1RU, 32x400G switch, which provides 4x higher bandwidth per RU
  • 400G optics modules along with 12.8Tbps switches deliver lowest cost & power per bit – 2x bandwidth per SerDes lane and 4x bandwidth per fiber pair
  • Much better TCO with fewer switches, optics modules, optical fibers & patch panels for the same network cross-sectional bandwidth as customers move from 100G to 400G

Innovium TERALYNX 7 is the world’s highest performance 12.8Tbps programmable switch silicon in production. We are proud to have collaborated with the optics ecosystem, system partners and cloud customers in 2018 & 2019 to test, resolve issues and validate different types of 400G optics so the customers can deploy robust 400G solutions in production networks with confidence. The combined switch and optics solutions offer significant time to market advantage along with superior quality and unmatched cost per bit.

Top cloud customers understand 400G optics solutions and have started to deploy them in production networks. We expect that the next tier of customers (e.g. tier-2 cloud, Telco cloud and high-end enterprises) will also start to deploy 400G optics solutions to realize the same benefits. This blog attempts to provide a tutorial as these solutions starts to see wider adoption in modern data centers. We start with how data centers are architected by cloud providers and then look at how these 400G optics solutions are used.

Cloud providers usually have a world-wide infrastructure footprint consisting of multiple regions that are distributed geographically across the world. Regions are made up of a set of availability zones (AZs), that are isolated from each other from availability (e.g. power outage) perspective. These AZs are connected to each other in every region. Each availability zone consists of a set of data centers or buildings that are connected to each other, as can be seen in figure 1.

Figure 1: Typical Cloud Infrastructure

As shown in figure 2, each of the data centers or buildings have tens of thousands of servers, connected to each other using a high-performance, high-bandwidth Ethernet network. These networks are built using multiple network tiers. The network tiers typically include

  • ToRs (Top-of-rack switches) for connecting servers, storage and other compute elements within a rack.
  • Leaf and Spine switches for connecting the various ToRs in server racks. Each of these tiers have many switches. There can even be multiple tiers of leaf & spine switches when the number of server racks are extremely large and they need a very high cross-sectional bandwidth.
  • DCI for inter-connecting various data centers or buildings, within a metro or region and even across regions. Customers sometimes have tiers of these switches to create a metro or regional network.
  • Internet facing routers that connect to other ISPs to connect to the internet. Some customers may put internet facing routers in the regional network.

Figure 2: Typical Data Center Architecture

There are thousands of switches inside each data center or building. 400G optics modules (or transceivers) are now being used to inter-connect these switches within the data center and across data centers. These modules come in two major form-factors and connect to each other using optical fibers. These optics modules carry data traffic using different types of optical modulations. And there are different types of 400G optics modules for different use cases in the data center. We try to cover each of these topics succinctly in the following section.

Form factors
There are two major form-factors for 400G connectivity. They are

  • QSFP-DD (Quad Small Form-factor Pluggable Double-density): this is used by most of the customers today. Switches with QSFP-DD cages also allow customers to plug in 100G QSFP28 modules. So, customers can start with 100G and migrate to 400G when they need to.
  • OSFP (Octal Small Form-factor Pluggable): this is mainly being used by Google. It doesn’t allow customers to plug in 100G QSFP28 modules.

Here is a quick recap of form-factors used at lower speeds: For 200G connectivity, QSFP56 is available – however, it is not widely used. For 100G connectivity, QSFP28 is used. QSFP+ is used for 40G, SFP28 is used for 25G, SFP+ is used for 10G and SFP is used for 1G connectivity.

Optical fiber and fiber Connectors

Optical fiber is used to connect the optics modules and is deployed mainly in two flavors – multi-mode (MM) and single-mode (SM). Multi-mode fiber supports connectivity for shorter distances and is often found in enterprises. Additionally, it is also used by large China Cloud providers. Single-mode fiber usually supports connectivity for longer distances and is often used by the large US cloud providers.

Different types of fiber connectors are used to plug into the 400G modules. MPO (Multi-fiber Push-on/Pull-off) allow multiple fibers to plug into a module. They are available in different flavors (e.g. MPO12 for DR4 and MPO16 for SR8). Duplex LC fiber connector is used for FR4/LR4/LR8, where a single pair of optical fiber is connected to a module. CS connectors are used for 400G-2FR4 modules.

Electrical and Optical Modulation
Electrical: A 12.8Tbps switch silicon has SerDes that supports 10/25G Non-Return to Zero (NRZ) and 50G 4-level Pulse Amplitude Modulation (PAM4) electrical modulation. 10/25G NRZ describes an electrical channel where data is transmitted only in two amplitude levels (0 or 1). 50G PAM4 describes an electrical channel where data is transmitted in four amplitude levels (00, 01, 10 & 11). That is shown in figure 3 with corresponding eye diagrams. For 400G optics modules, the switch silicon SerDes connects to a 400G optics module using 8 x 50G PAM4 electrical SerDes lanes (i.e. 8 SerDes, each using 50G PAM4). For 100G QSFP28 optics modules, the switch silicon SerDes connects to a 100G optics modules using 4 x 25G NRZ electrical SerDes lanes (i.e. 4 SerDes, each using 25G NRZ).

Figure 3: 25G NRZ and 50G PAM4 signals with eye diagrams

Optical: Optical modulation used by a 400G optics module is either 8 x 50G PAM4 or 4 x 100G PAM4, depending on the type of the optics module. For modules that have 4 x 100G PAM4 optical lanes, a gearbox chip is used in the module to convert 50G PAM4 electrical signals to 100G PAM4 optical signals.

Inside a Module
Figure 4 shows the key building blocks inside a 400G optics module.

Figure 4: Inside a 400G optics module

As shown in figure 4, the 400G optics module connects to the switch silicon using eight SerDes lanes that carry 8 x 50G PAM4 electrical signals. The SerDes connect to a CDR (Clock Data Recovery) chip, which recovers the clock for synchronized optical transmission. The CDR chip may have a gearbox.

The optical components inside the module include TOSA & ROSA (Transmitter & Receiver Optical Sub-Assemblies). TOSA is responsible for converting electrical signals to optical signals for transmitting them over optical fibers, using lasers. ROSA is responsible for converting optical signals into electrical signals using photodiodes. There are different types of optical technology used in these module. They are

  • VCSEL (Vertical Cavity Surface Emitting Lasers) is used for optics modules that go up to 100m and the technology is extremely cost-effective. It is used with multi-mode optical fibers.
  • DML (Directly Modulated Lasers) and EML (Electro-absorption Modulated Lasers) are used in modules that go up to 10km. EML has less wavelength dispersion and has a more stable wavelength at higher speeds. Hence, many 400G modules use EML today. This is used with single-mode optical fibers.
  • Silicon Photonics is smaller in size and highly power-efficient, and it is just becoming available inside 400G modules.

Different types of optics modules used for intra-DC connectivity

Here are the most common 400G optics modules used inside the data center. They are also referred to as client optics or grey optics.

Optics Type: 400G-SR8
Description: Up to 100m over 8 MM fiber pairs, using 50G PAM4 optical signals. Modules use cost-effective VCSEL technology.
Fiber Type & fiber connector: Multi-mode fiber, MPO-16 connector
Use Case: Cost-optimized 400G solution for up to 100m. It can also fanout or breakout to 50/100/200G links. Breakout to 2x200G-SR4 is deployed by one US cloud provider.

Optics Type: 400G-SR4.2 (400G-BiDi)
Description: Up to 100m over 4 MM fiber pairs, where each fiber-pair can support 100G-SR1.2 (two 50G PAM4 optical signals are carried on a fiber pair for 100G-SR1.2, aka 100G-BiDi). These modules use cost-effective VCSEL technology.
Fiber Type & fiber connector: Multi-mode fiber, MPO-12 connector
Use Case: Customers reuse MM fiber in already present in enterprises/data centers to carry higher bandwidth. It can support both 400G to 400G connectivity as well as breakout of 400G to 4x100G connectivity.

Optics Type: 400G-AOC
Description: Up to 30m over multiple MM fiber pairs. AOCs provide a complete solution, that includes two optics modules connected using multiple fiber pairs. They come in different fiber lengths to cater to different customer requirements.
Fiber Type & fiber connector: Usually MM fiber, NA (not needed)
Use Case: It is a cost-effective optical connectivity option for short reach (up to 30m).

Optics Type: 400G-DR4
Description: Up to 500m over 4 SM fiber pairs, using 100G PAM4 optical signals on each fiber pair. 400G-DR4+ modules support extended reach of up to 2km.
Fiber Type & fiber connector: Single-mode fiber, MPO-12 connector
Use Case: A cost optimized solution for up to 500m. Customers can used it for 400G to 400G connectivity as well as breakout 400G to 4x100G connectivity. Breakout option is often used for Leaf to ToR, in conjunction with 100G DR1 (100G DR1 is a new QSFP28 optics module used to plug into legacy 100G switches with NRZ electrical signals).

Optics Type: 400G-FR4
Description: Up to 2km over 1 SM fiber pair, using four 100G PAM4 optical signals muxed on the fiber pair. LC connector is used.
Fiber Type & fiber connector: Single-mode fiber, Duplex LC connector
Use Case: Usually used between leaf, spine & DCI within data centers, or across data centers for up to 2km use.

Optics Type: 400G-2FR4
Description: Up to 2km over 2 SM fiber pairs, each fiber pair carrying 200G using four 50G PAM4 optical signals muxed on each fiber pair. CS connector is used.
Fiber Type & fiber connector: Single-mode fiber, Duplex LC connector
Use Case: It is used by one US cloud provider to double bandwidth for each fiber pair, moving from 100G CWDM4 to 200G FR4.

Different types of optics modules used for inter-DC connectivity
Here are the most common 400G optics modules used for inter-DC connectivity.

Optics Type: 400G-FR4
Description: Up to 2km over 1 SM fiber pair, using four 100G PAM4 optical signals muxed on the fiber pair. LC connector is used.
Fiber Type & fiber connector: Single-mode fiber, Duplex LC connector
Use Case: For DCI (data center interconnect) applications for up to 2km.

Optics Type: 400G-LR8
Description: Up to 10km over 1 SM fiber pair, using eight 50G PAM4 optical signals muxed on the fiber pair.
Fiber Type & fiber connector: Single-mode fiber, Duplex LC connector
Use Case: For DCI applications for up to 10km.

Optics Type: 400G-LR4
Description: Up to 10km over 1 SM fiber pair, using four 100G PAM4 optical signals muxed on the fiber pair.
Fiber Type & fiber connector: Single-mode fiber, Duplex LC connector
Use Case: For DCI applications for up to 10km.

Optics Type: 400G-ZR
Description: Coherent optics modules for up to 120km over 1 SM fiber pair, that can be muxed using a simple external MUX if required. This module is expected soon. 400G-ZR+ is expected to go beyond 120km reach.
Fiber Type & fiber connector: Single-mode fiber, Duplex LC connector
Use Case: For DCI applications for up to 120km use. Two 400G-ZR modules can be connected back to back or be connected using an external DWDM MUX to carry multiple 400G data streams over a single dark fiber pair.

Here is a summary of the information above.

Customer transition from 100G to 400G optics

As customers deploy 400G connectivity, they are transitioning from 25/100G connectivity to 100/400G connectivity for the different tiers of the network. Let us look at network transition at the different tiers as shown in figure 5.

ToR to Server/storage
Most customers use DAC (Direct Attach Copper) cables to connect a ToR to servers/storage as they are very cost-effective. We expect that to continue. Some customers may augment DAC with ACC (Active Copper Cables) to extend beyond 3m reach of 400G DAC cables.

Between ToR & Leaf
For 100G, most customers use AOC, SR4, BiDi or PSM4. We expect 100G-AOC customers to use 400G-AOC for up to 30m connectivity. We expect 100G-SR4 & 100G-BiDi customers to use 400G-SR8 and 400G-BiDi for up to 100m connectivity. We expect 100G-PSM4 customers to deploy 400G-DR4 to 400G-DR4 or 400G-DR4 to 4x100G-DR1 connectivity.

Between Leaf & Spine and Spine & DCI
For 100G, most customers use PSM4 or CWDM4. We expect them to move to 400G-DR4 for up to 500m and to 400G-FR4 for up to 2km connectivity. We expect 400G-DR4 volumes to be high, as it is being used in multiple tiers of a data center, where the number of switches is high.

Inter-DC Connectivity
For 100G, most customers use CWDM4, LR4 or Coherent optics depending on distance requirements. They will likely move to 400G-FR4, 400G-LR8/LR4 or 400G-ZR Coherent optics, depending on reach requirements. Customers often use MACsec on these links for privacy and security reasons.

Figure 5: Most popular connectivity options

Summary

Innovium 12.8Tbps TERALYNX switch silicon has been designed-in by the world’s leading OEM and ODM production switches. These switches are now being deployed into production networks by the world’s leading cloud providers with 400G optics modules to scale their networks and at the same time reduce cost, complexity, and power. We expect the next tier of customers to also adopt 400G in the future and benefit from its advantages.

We want to thank the Innovium team, along with optics & systems partners and cloud customers, who have helped with extensive testing and validation to make these 400G optics solutions highly robust.