PCI Express fabric: Rethinking data center architectures - Q&A with David Raun, Senior Executive VP and General Manager, PLX Technology
With its 8 Gbps speeds and potential for reducing connection costs, PCI Express Gen3 is gaining acceptance in a variety of embedded applications, including cloud-driven data centers. David explains how PCI Express is playing an increasing role in system fabrics and introduces a next-generation system model presented at the recent PCI-SIG Developers Conference that uses PCI Express-centric hardware and software as the backbone of a data center rack.
ECD: As the number of cloud-connected embedded devices skyrockets, what new demands are being placed on data center technology?
RAUN: The expansion of high-bandwidth and media-focused devices is putting greater demands on data center throughput. This means that the pipes need to get faster, and the basic building blocks need to handle more information. Simply adding more lower-bandwidth pipes such as 1 GbE does not work because physical deployment and cabling can be expensive and unreliable. The building blocks need to be faster – connected through 10 GbE and PCI Express (PCIe) Gen3 – and the systems need to use these building blocks more efficiently. Using high-performance converged technologies as the building blocks for data centers offers both power and cost advantages, along with improved actual throughput capabilities.
In addition to overall system speed, these building blocks need features and capabilities to address special data center needs. One such need is an increase in storage. High-quality audio and video take up a lot of space, and cloud computing is rapidly evolving to make data available wherever users are, which means that content needs to be stored in the cloud.
As a practical matter, this is accomplished through a combination of high-speed Solid-State Drive (SSD) storage and less expensive but slower hard disk drives, and the internal components need to enable such systems. This includes providing fan-out for large arrays of enterprise storage, which are now connected through PCIe, and also offering capabilities to offload the CPU, move data between the storage systems more efficiently (not needing to keep going to the CPU), and offering caching approaches close to the source of data. These are all dependent upon the switching devices at the root of the box.
ECD: There have been other attempts in the past to use PCIe as a fabric, so what are the differences in what PLX is doing?
RAUN: Previous attempts have required special hardware and/or software at the application level to make the system work. PLX is developing a solution called ExpressFabric (see Figure 1) where applications can work without any changes and existing hardware is leveraged. Driver-level changes are either minor or unnecessary.
Another major difference between this solution and what was done in the past is that PLX is utilizing its mainstream switch business to provide this additional capability. ExpressFabric thus can be offered at an incremental cost over a traditional, high-volume switch and sold to designers who already use our products in typically fan-out applications. Previous attempts to offer this capability were deployed by vendors for whom this was their main or only revenue opportunity. With ExpressFabric, a system designer can use the level of capabilities needed for a specific opportunity and pay the incremental price for the additional features.
ECD: What are the major obstacles to adoption of this technology, and how is PLX addressing them?
RAUN: One major obstacle is resistance to using something different. PLX will provide the silicon, production-capable reference designs, and software to demonstrate that the system does what it’s intended to do. Another obstacle is the large development effort to make the system work. PLX silicon, software, and reference designs provide the building blocks, allowing the user to focus on adding value. Because our system makes the application and maintenance software look like it does today, an existing system can be quickly and easily created and deployed.
It is important to note here that the needs of new data centers cannot be addressed using existing technology. So the decision faced by data center managers and the vendors who supply them does not include the option to keep what they have versus moving to something different. Their current approach is not adequate, and all of the alternatives open to them are new and different and have some risk associated with them. There is no clear way that the two major existing technologies – Ethernet and InfiniBand – can offer enough scalability and performance at a reasonable cost point. Their approach is to just go faster with the next more expensive generation of products. But they do not address the issues of convergence, power, or in the case of Ethernet, latency.
ECD: What are the specific advantages of ExpressFabric over Ethernet or InfiniBand? Why would somebody change interconnects?
RAUN: A major advantage ExpressFabric has over both technologies and others within the rack is that the subsystems already have PCIe coming out of them. Because of this, it is necessary to add hardware – increasing power, cost, and latency – to bridge back and forth to the other technologies across the rack backplane:
- In the case of Ethernet, when including the extra components and the ability of PCIe to scale to higher bandwidths easily just by adding lanes, ExpressFabric can offer higher throughput, lower power, and lower cost. For InfiniBand, it can offer significantly lower cost because of fewer components and the typical high cost of InfiniBand-based solutions, as well as lower power at similar performance levels. InfiniBand is a higher-performance interconnect than Ethernet, but at much higher cost.
- In addition to providing these benefits, PCIe is well equipped to combine multiple protocols within the rack. Both Ethernet and InfiniBand are required on a backplane if there is traffic that depends on both of them, but the actual data almost always comes out natively as PCIe. So it is beneficial to leave the data in PCIe form in the rack and bridge to somewhere else at the top of the rack, where it needs to interface to other technologies. This provides the same overall benefits mentioned earlier, such as power, cost, and latency, but for converged platforms the benefits scale even more, as two sets of bridges are being eliminated on each blade rather than just one.
ECD: Do you plan to release ExpressFabric as an open standard to the industry? Do you have cosponsors?
RAUN: Yes, we plan to drive this as an open standard. It is our belief that the best way to get ExpressFabric adopted is to maintain control of it initially so that it is not burdened with the complexity and feature creep that eventually lead to limited deployment of both advanced switching interconnect and multiroot I/O virtualization. We are taking the approach of doing a few key items well, rolling out products that satisfy the broad market, then moving to standardize it once this has been accomplished. We have customers and partners who are working closely with us to ensure that systems can be built in a timely manner, but we are choosing to do this with a limited number of engagements. The companies involved wish to keep their plans confidential until they are closer to launch, and it is expected that this approach will be deployed in a standardized manner once the initial products are in place.
Theaims to provide open platforms for a variety of applications, including data center racks. Open Compute has chosen PCIe as the protocol for I/O sharing, and we are working with that organization to deploy our solutions in this space.