Monthly E-letter

Latest edition | Subscribe

New trends in heterogeneous multicore SoCs

Data plane processing units allow designs to be more flexible and optimized than what designers could achieve in the past.

The multicore/multiprocessor design world has seen a great deal of activity in the past two years, and the growth is accelerating. Announcements about new multicore and multiprocessor platforms are continuing to flood news feeds, and the multicore programming Model space is quickly consolidating, with Intel buying several software tools companies such as Cilk Arts and RapidMind.

How can designers understand this space and track a path through it to optimize the products they are working on? It makes sense to start with some simple classifications for multiprocessor designs and a taxonomy that can help designers figure out the right strategies for using multiprocessors.

Two planes of interest

In this approach, two types of processing are used in a modern System-on-Chip (SoC) or system: control plane processing and data plane processing. Figure 1 illustrates this simple concept.

Figure1
Figure 1: In a typical SoC, control plane processing manages the user interface, system synchronization, and other functions while data plane processing manages real-time and data-intensive tasks.
(click graphic to zoom by 1.9x)

Control plane processing handles the user interface, higher levels of protocol stacks, system synchronization, applications (for example, contact lists, notes creation, and calendar functions on a mobile device), and many other tasks that are neither real-time nor data-intensive.

The data plane is where demanding real-time and data-intensive applications are carried out. These include streaming tasks such as audio and Video processing, baseband and lower levels of protocol processing for wired and Wireless communications, encryption and decryption tasks, and many others.

Control plane processing is often mapped to standard, fixed ISA processors and controllers. These GPPs offer generic computing at a reasonable cost. When the applications vary widely and are unknown at design time – for example, downloadable apps on smart phones – a GPP might be the best solution. To handle an increasing number of these applications on future embedded devices, turning to a homogeneous multicore architecture for the control plane is a reasonable way to add capacity that can be powered down when not needed.

In the data plane, architects used to choose either hardware solutions with custom logic or synthesized RTL or fixed ISA DSPs. But new options allow designs to be more flexible and optimized than what designers could achieve in the past. These new options lead to heterogeneous multiprocessor architectures for SoCs and embedded products, not homogeneous multicore solutions.

Moving to ASIPs

In the past decade, configurable and extensible processor technology has undergone considerable development. This technology is based on three key characteristics:

  1. A general RISC CPU and its instruction set, along with a wide variety of configurable external interfaces and other CPU features for interrupt handling and context switching.
  2. Data-intensive computational elements similar to what is found in DSPs, such as multipliers and MAC units, zero overhead loops, and Harvard architectures, along with configurable multiple-load store units.
  3. Of greatest importance, a highly automated tool flow based on processor generation tools that allow users to select from hundreds of structural configuration options, including efficient direct FIFO interfaces. The tool flow also allows design teams to customize the instruction set by adding specialized instructions to accelerate processing in specific application domains. The number of new instructions ranges from a few to many hundreds, depending on the complexity of the targeted algorithms. The instructions can be multicycle, using new resources such as additional register files and single-instruction, multiple-data processing units.

Combining these technologies in new architectural choices provides a class of ASIPs called Data plane Processing Units (DPUs).

As illustrated in Figure 2, computation and communications in the data plane is no longer the sole province of dedicated hardware or fixed ISA DSPs. DPUs are viable options for some or all of the processing in almost every data-intensive application task in the data plane.

Figure2
Figure 2: DPUs combine advances in configurable and extensible processor technology to handle computation and communications in the data plane.
(click graphic to zoom by 1.9x)

A sound example

Audio encoding and decoding is a good example of DPU use. Many years ago, the natural audio codec implementation in embedded appliances was dedicated hardware. Today, designers can use a dedicated audio DPU or add specialized instructions to a processor intended for multiple data plane functions. These DPUs can offer both energy efficiency and in-field programmability.

Many system architects and design teams might not realize that MP3 decoding is possible in a device with excellent audio results running at just a few MHz in a 65 nm low-power process – on a processor. Because it is based on a DPU, this device can execute new audio codecs as they are made available for download; therefore, the choice of an audio DPU for a product today can last through several generations of new codecs, which seem to appear on almost a monthly basis.

Architects and design teams can use DPUs in many ways:

  • Teams with in-depth knowledge of an application domain can utilize the processor generation technology to customize their DPUs and embed their proprietary Intellectual Property (IP) into both hardware instructions and domain-specific software. The DPU helps protect IP because new instructions are given only to the design team and overall function is delivered in precompiled software.
  • Suppliers can create DPUs for highly specialized domains such as audio, or provide packages of instruction definitions and codecs that can be added to a customer-specified DPU. For a highly data-intensive domain such as baseband processing, suppliers can develop extensively customized DPUs that can accommodate a variety of protocols and standards.
  • Design teams can take a DPU definition from a supplier and, by adding some of their own customizations and instruction definitions, “make it theirs.” They can add proprietary knowledge and further instructions to a more generic starting point. When optimized for the DPU, the resulting software is target-specific and can operate with greater energy efficiency than when running on a generic CPU.

Many data plane applications involve media-intensive processing work on data streams, including audio, video, and other data that might be encrypted, decrypted, preprocessed, postprocessed, encoded, and decoded. Equally important as the computational aspects for these tasks is the ability to handle intensive and possibly bursty data streams using configured interfaces. Supporting these communication needs requires configurable technology so that design teams can choose the optimal mix of standard bus-based, shared memory, and direct FIFO interfaces for the application tasks. At the same time, this configurability opens up new possibilities with heterogeneous ASIP subsystems for multimedia.

More options arise

This technology is not limited to the data plane. If control plane processing can benefit from instruction customization or specialized high-performance interfaces such as direct FIFO channels, then the same configuration and extension technology can be applied to design a new control plane processor or multiprocessor. This can include applications that are optimized using local instruction and data/scratchpad memories or applications that need to pass messages from one processor to another in a more efficient manner than shared memory, such as using direct hardware FIFOs via queue interfaces.

Given this information, the right architecture for embedded systems includes a combination of general-purpose and customized processors in the control plane and a set of heterogeneous DPUs placed in different parts of the data plane. As the need to design embedded products with heterogeneous multiprocessing architectures has arisen, technology has advanced to offer design teams some interesting new options for handling the rich set of functions that modern convergence embedded appliances are designed for.

Grant Martin is a Chief Scientist at Tensilica, Inc., based in Santa Clara, California. Prior to joining Tensilica, Grant worked for Burroughs in Scotland, Nortel/BNR in Canada, and Cadence Design Systems, where he eventually become a Cadence Fellow in their Labs. Grant is a coauthor and coeditor of nine books dealing with SoC design, SystemC, UML, modeling, EDA for integrated circuits, and system-level design. He received his Bachelor’s and Master’s degrees in Mathematics (Combinatorics and Optimisation) from the University of Waterloo, Canada.

Tensilica 408-986-8000 gmartin@tensilica.com www.tensilica.com

Silicon, software, and strategies for embedded devices
©MMX Embedded Computing Design.
An OpenSystems Media publication.