Embedded Computing Design

Subscribe

Receive our complimentary magazine via U.S. Mail or E-mail.

MES

Low power: The key issue for system integration in mobile devices

Pete Hardee, Cadence Design Systems

4The hardware and software worlds are colliding, and integration is providing a spark that will require system developers to keep a sharp eye on overall power demands. Innovative techniques can enable developers to validate whether software is correctly controlling the power-saving capabilities in the hardware platform and verify that the device can meet the power requirements in real system conditions.

As the lines blur between hardware and software development and integration, engineers tend to overlook the importance of developing these systems with power in mind. Even if a hardware design is optimized, the delivered within these systems must correctly and efficiently use the power-saving capabilities built into the hardware.

Teams developing these latest and greatest electronic systems need techniques focused on relieving the pressure of hardware/software integration. Techniques such as Power Shut-Off (PSO, also known as power gating), Multi-Supply Voltages (MSV), and Dynamic Voltage and Frequency Scaling (DVFS) can leverage platforms and advanced system-level verification capabilities such as emulation. Engineers require new forms to measure and dynamically analyze power requirements for integrating embedded software with hardware, which must be tested in real-world system modes by leveraging platforms.

Yes, software burns power

Designers tend to think of power as a hardware issue, and indeed, it is the electronic hardware in an embedded mobile device that dissipates the power. Yet many recent instances highlighted throughout the blogosphere demonstrate how the latest release of a mobile device (OS) resulted in consumers complaining that their battery life was suddenly drastically worse. How so? While the power is dissipated in hardware, what the hardware is doing at any given moment depends on the user activity and system modes of operation, all under software control. For all the power-saving techniques that can be added to today’s devices, the software needs to use them correctly to get the desired result.

The OS updates that have caused the greatest problems invariably changed the behavior of the system to leave one or many real-world interfaces in the “on” state, either by default or for longer periods of time compared to the previous release. In order of importance, here are the areas where the greatest amount of power is consumed in a typical mobile multimedia device:

  1. Peripherals and modems
  2. Memory architecture
  3. Digital subsystem and processor cores

Peripherals and modems

Pick up your favorite mobile device and press any key. The display lights up to full brilliance, and the keyboard probably lights up as well. How long before the display dims? When does the keyboard cease to be illuminated? We spend a lot more time interfacing with our mobile devices – browsing the Web, watching movies, texting, e-mailing – than we used to.

The LCD display driver in particular has become a big power sink, especially with the increased demand for bigger, brighter, and higher-definition screens. Furthermore, a modern global cellular device can have up to four modems. Voice, SMS, e-mail, and Web browsing all use a cellular modem. Alternatively, data may use an 802.11 modem. Additionally, the device may use a Bluetooth modem or a satellite modem in a GPS subsystem. What is the device doing, not only while you are using it, but when you are not? Even then, the modems may be on, polling and synchronizing with various cellular, local, or personal networks.

The power amplifiers in these are particularly power hungry, but usually necessarily so, and the suppliers of these components make them as efficient as possible. Depending on the device usage profile, the modems may use more energy over time than the peripherals. However, software defaults such as always connecting the device to a Wi-Fi network if one is available or leaving the LCD display powered may have a detrimental effect on battery life, and sometimes unnecessarily.

Memory architecture

After the interfaces to the outside world, the movement of data within the device is responsible for the next most significant power usage. The memory architecture is structured in layers, usually referred to as L1, L2, and L3. L1 is cache memory, and usually caches exist for both processor instructions and data. L2 is and L3 is off-chip, usually in the form of DRAM and flash.

When software executes, memory transactions are the inevitable consequence. Transactions in L1 are quickest and cost the least in terms of power, then L2, then L3. There may be an order of magnitude difference in the power consumed by transactions successfully serviced in L1 versus L2 and L2 versus L3. Different cache policies and sizes are selected. Cache algorithms such as Last-In, First-Out (LIFO), First-In, First-Out (FIFO), or random can be more or less efficient depending on the regularity of the application being run. Once the policy is set, cache misses are subject to great attention, as the system will need to go to L2 or even L3 memory to retrieve the missing data. DDR or flash controllers manage access to L3. For memory transactions being serviced, most of the power is dissipated in the memory itself, rather than the controller or PHY interface. However, in most systems, especially ones processing video or graphical data, the memory controller is inevitably the bottleneck.

More power can be consumed in the system by memory transactions that are not granted access to memory compared to those that are processed; thus, depending on another set of priorities and policies, these waiting transactions must either be stored in FIFO until they can be processed or cancelled and resent at a later time. Depending on the current memory usage, memory pages can be switched off or put in standby mode when not being accessed. The efficiency of the policies and algorithms for this memory gating is another area that is highly dependent on software application and usage.

Digital subsystem

The rest of the digital platform including processor cores, hardware accelerators, random logic, and distributed registers account for the remaining power dissipation. It may be counterintuitive to some that the processor core itself would account for less of the power dissipated when software runs on a hardware platform, compared to memory or the peripherals the processor subsystem controls, but in most mobile devices with typical usage profiles it is true. Another rule of thumb is that typically in a digital subsystem, the clock network may account for 40 to 50 percent of the dynamic power.

Techniques to minimize power

Today’s mobile devices are multifunction communication, entertainment, and productivity systems that pack an incredible array of functionality and performance. These devices are powered by the same size as the one-trick-pony voice-only cell phone of a few years ago – a battery that upsets us if it doesn’t last at least a day without needing to recharge.

So what techniques were introduced in recent years to control the rate of energy used? At first, low-power design usually meant no more than clock gating – controlling dynamic power by disabling the clock to parts of the circuitry that could be idle and choosing a frequency of operation no faster than what was needed to meet performance. Then came processes and library cells with Multiple threshold Voltages (MVt). A low-threshold voltage switches quickly but uses more power, while a higher-threshold voltage uses less power at the expense of performance. These methods helped with both dynamic and leakage power.

As technology progressed down successive process nodes, this increased frequency (good for performance but not for power) but reduced supply voltage. Because energy is proportional to the square of voltage, this made up for the increase in frequency. Engineers could implement processes and libraries with MSV using different levels to balance performance needs versus power dictated.

However, the reduction in supply voltage has its limits. Reduction in Vdd too close to Vt to greatly increased leakage. Despite the availability of other techniques such as body biasing to control leakage, technology reached the point somewhere around the 45 nm node where leakage became a bigger issue than dynamic power, and the only sure way to deal with it was to turn the circuit off when not in use. Hence, one advanced low-power design technique, PSO with or without state retention, has become common in today’s designs. This technique is also known as power gating, or State Retention Power Gating (SRPG).

Meanwhile, developers continued to control the trade-off of dynamic power versus performance needs with techniques like DVFS. This technique works well on self-contained blocks whose performance supply and demand can be somewhat easily measured. DVFS has been successfully implemented on processor cores, for example.

These techniques have significantly increased the complexity of both design and verification. There are many different power states, all under software control. Certain protocols must be followed to successfully switch various parts of the design between power states, as well as switch them off and bring them back up when required.

Besides the functional design and verification needs all of that implies, there are many other structural needs. Each group of circuitry whose supply voltage may be separately switched, known as a power domain, must be fully separated from other power domains using appropriate isolators or level shifters on every signal that crosses the domains. The hardware team can exhaustively test and ensure that these techniques are implemented correctly, and all the necessary methods and solutions are available to do that. The purpose here is twofold: First, to validate during system integration that the software can correctly and efficiently control the power-saving capabilities provided in the hardware platform; and second, to analyze the resultant power savings and verify that the device can meet the overall power specification in real system conditions.

Validating power management under software control

System bring-up is usually executed on prototypes, often a board-based representation of the hardware implemented in rather than the final System-on-Chip (). However, it is very rare for power-management capabilities such as those described in the preceding section to be successfully represented in a hardware prototype. Hence, the power-management aspects of system integration are all too often tested piecemeal, separately on the hardware side and the software side, and the two typically don’t get integrated until the real hardware is available.

Can simulation help? Yes, but execution times are often prohibitively long to thoroughly check out power management with software. The problem is that once power domains are introduced to a design, many different power modes need very complex (and much longer) vectors to place the chip into that power mode and provide traffic representative of the system mode or combination of modes to which each power mode corresponds. Some companies painstakingly work out the data bandwidths for various parts of a chip to come up with vectors for maybe 30 different modes for power analysis. Even this might be missing a lot.

Bear in mind that these power-saving techniques do not come free. There is overhead associated with switching power domains off and bringing them back online. Unless a power mode endures for a certain time, turning idle circuitry off may waste power, not save it. Computation can be sped up so the power domain can be turned off for a longer amount of time. Hence, all the combinations of moving between those 30 modes must be considered to obtain an accurate picture. This becomes challenging for any simulation-based technique.

One answer is to use an emulation tool such as Cadence’s Palladium Verification Computing Platform. Palladium was recently extended for power-aware execution, so vectors can be run thousands of times faster than logic simulation. This allows significant software to be executed, enabling engineers to check that power domains can be powered down and restored under software control. Virtual platforms – high-level simulation of the hardware platform executing fast enough to allow significant software testing – also show promise as a solution in this area. However, more work must be done to successfully represent the necessary power states in these high-level models. Some of the advantages of emulation versus simulation can be seen in Figure 1.

Figure1
Figure 1: Validating power management with emulation provides many advantages over simulation.
(click graphic to zoom by 1.9x)

System-level power analysis

Fundamentally, dynamic power boils down to a function of two things: characterization and switching activity. Characterization means accurately measuring and modeling what happens when a transistor switches – a function of Vdd2, R, C, and increasingly, L. Switching activity means the frequency and duty cycle at which the switching happens for each of the transistors in the circuit of interest. That activity can be reduced to some extent by using the lowest clock frequency that gets the job done and turning the clock off when not needed.

To date, characterization has dominated thinking in power analysis. This has led to the idea that acceptable accuracy can be ensured only when there is a placed and routed netlist, all the transistors and wires are known, and all the RC values are extracted. However, characterization no longer seems to be the problem designers are struggling with; it’s the activity. What vectors can run on the transistor netlist? What are all the system modes to generate realistic activity in today’s multifunction devices? Are the vectors replicating those modes, or just running test patterns, or using statistical methods, which bear scant relation to real-life device operation? Given the complexity of power modes, and hence the quantity of activity vectors, this approach cannot cope with the activity, and the so-called accurate power analysis tools operating at gate level or physical netlist level may not be representative of the circuit used in the system. At the other end of the design spectrum, it may be easier to analyze real activity driven by executing system software, but the accurate characterization is missing. These issues are illustrated in Figure 2.

Figure2
Figure 2: The complexity of power modes makes system-level power analysis more difficult to accomplish accurately.
(click graphic to zoom by 1.9x)

Palladium has a capability called Dynamic Performance Analysis (DPA) that allows designers to run as much real system-level activity as necessary, running real system modes under software control. The resultant activity is measured on all relevant points in the design mapped onto the emulator. This is very efficient, as this is exactly what an emulator is designed to do.

Besides the hardware design being mapped to the emulator box, the design is characterized using Cadence logic synthesis technology under the hood to map the design to the real cell library. This renders full representative system activity and characterization from the actual silicon process. The implementation is not exactly the same as the real chip would be, so it’s still an estimate, but it may be the closest designers will get until the actual silicon running the actual application software is available. Which of course is probably too late.

Figure3
ECD in 2D: See a smartphone simulation running on the Cadence Virtual System Platform demonstrated at this year’s . Use your smartphone, scan this code, watch a video: http://opsy.st/kyt6fA.
(click graphic to zoom)

Pete Hardee is a director of solutions marketing at . He is a 16-year veteran of the and silicon IP industries. His experience previous to Cadence includes positions at , CoWare, and Silistix. While at CoWare, he was a founder and first cochair of the Open Initiative. He has a BSc in Electrical Engineering from Imperial College, London, and an MBA from Warwick Business School.

Cadence Design Systems phardee@cadence.com www.cadence.com

Leave a Comment