ARM's big.LITTLE architecture aims to satisfy the hunger for power - Q&A with John Goodacre, Director, Technology and Systems, ARM Processor Division
As smartphone and tablet users continue to demand much higher performance to keep pace with an ever-more-connected lifestyle, the demand for extended battery life naturally follows. However, one way ARM is meeting the challenge is with its big.LITTLE multicore-based architecture, which aims to boost power savings by as much as 95 percent to help satisfy today’s power-hungry computing paradigm. Additionally, John discusses how preserving embedded market technologies’ safety-critical design features while maintaining predictability and amping up performance using multicore processors can present a daunting challenge.
Anyone who hasn’t heard of ARM has clearly been living under a rock, at least in regard to smartphone offerings. But describe (briefly) your multicore offerings for automotive infotainment, smart meters, and embedded computing.
GOODACRE: ARM characterizes the market into three primary segments, those that require the support of rich operating systems, those that require true real-time and predictable execution, and those that will be embedded within microcontrollers, also known as the Application, Real-time, and Microcontroller profiles.
The same kind of technology that goes into smartphones is also applied to automotive infotainment. We provide Cortex-R and Cortex-M series processors for embedded computing markets and smart metering applications that are low power and can operate for extremely long periods of time on a single battery. For application and embedded computing markets alike, our partners utilize the Cortex-A9 with ARM graphics- and video-processing engines to deliver products like the Ford Sync and many of Samsung and LG’s Smart TVs.
What have been the top technical challenges your customers have faced in those markets? How can those challenges be solved?
GOODACRE: As a whole, the processing needs for different markets can be clustered around common sets of requirements. The Application profile processors have been driven by the peak performance available to a system running on a platform operating system. This profile is typified by those of mobile, infotainment, and other consumer devices such as the Smart TV. This has necessitated multicore support for an SMP-based operating system where the tasks of the system can be automatically shared among multiple processors. Sharing tasks across the processors, especially when arranged in a big.LITTLE arrangement, allows the required performance to be shared across the cores, and as such deliver the performance also in a lower power envelope. In the highlighted markets, smartphones and auto infotainment are addressed by this Cortex profile.
The Real-time profile processors have also been driven by peak available performance; however, this segment utilizes an RTOS, and as such defines specifically where and when specific tasks will run. These systems must also maintain the predictability and safety characteristics required by many markets. In the highlighted markets, most embedded computing is addressed by this Cortex profile. For example, within a hard disk, the head must be positioned exactly at the right place, at the right time, so as to read the data as it spins past under the head. In automotive, when the driver presses the brake, that level of braking must be applied to the physical brakes.
Finally, the Microcontroller profile is similar to the Real-time profile in its use of an RTOS; however, it is more driven by the needs of embedded flash than the scalability of real-time performance. As such, these microcontroller parts, assuming they use more than one processor, will deploy them as independent systems within the multicore microcontroller. The smart meter is addressed by this profile of Cortex processor because of its extremely small size, and hence low cost, while also consuming little power – allowing battery life measured in years or scavenged from the environment around the sensor.
What is the biggest challenge right now for ARM in engineering multicore processors?
GOODACRE: One of today’s most significant challenges is how to create an SoC that meets the conflicting consumer demand for devices with both higher performance and extended battery life. Mobile usage has changed significantly and today’s consumers are increasingly using their smartphone for the majority of their connected lives. Because of that, the performance demanded of current smartphones and tablets is increasing at a much faster rate than the capacity of batteries or the power savings from semiconductor process advances. At the same time, users are demanding longer battery life within roughly the same form factor.
Each profile of processors has its own technology challenges. The Application challenges are around how to support even more processors while maintaining support for the software models used by this market. The interest from the enterprise and server markets in using ARM technology also drives challenges around how to utilize many dozens, or significantly more, ARM processors in a single system. For embedded markets, maintaining the predictability and safety-critical design features while increasing performance through multicore also presents its own challenges.
Briefly describe your big.LITTLE technology, introduced in 2011 and recently described as gaining more momentum. How does it work, technically speaking?
GOODACRE: big.LITTLE processing is no more complex for software applications than today’s SMP capable operating systems. However, under the hood, it is an energy savings method where high-performance CPUs and efficiency tuned CPUs are connected in a cache-coherent combination so the operating system can dynamically assign application tasks to the appropriate CPU based on performance needs.
The more powerful “big” core is responsible for handling computing intensive tasks, such as rendering a Web page, whereas the less powerful “LITTLE” core handles lesser demanding tasks, such as MP3 playback. Both the cores implement exactly the same processor architecture (ARMv7), and are capable of executing the same instructions. The only difference lies in the way the cores handle the execution. While the “big” core is designed with performance as its primary goal, the “LITTLE” core is designed with efficiency as its principal target. Thus, an application or program run on one core can also run on the other without knowing any difference except with different performance and power consumption levels.
Can big.LITTLE be accomplished with single-core processors, or just multicore?
GOODACRE: big.LITTLE is fundamentally based on a multicore system where a given processor has the ability to more power efficiently execute the same software as a higher peak performance core can. The big.LITTLE architecture is designed using the technique of employing separate cores with different computing powers within the same system.
This asymmetry in power efficiency in processors can be realized by running a specific CPU at a different voltage to another, or by implementing a CPU in a more power-efficient manner. The highest dynamic range and hence amount of power savings, however, is delivered when a CPU can use both the voltage and implementation aspects, but most importantly is built using a fundamentally more power efficient microarchitecture such as that realized by the Cortex-A7 alongside the Cortex-A15.
Does the big.LITTLE estimated 70 percent savings on processor energy consumption apply only to smartphones?
GOODACRE: big.LITTLE processing is currently targeted at the smartphone market. That said, the power savings is driven by the dynamic range in the required performance of an application. If the application only required high performance 5 percent of the time, then the savings would be around 95 percent. If an application only leaves the high-performance case 5 percent of the time, then the savings would be closer to 5 percent. Thankfully, the big.LITTLE system can move tasks between these states very quickly, in the manner of a few 10’s of a nanosecond; this means that even the most predictable high-performance systems also spend more than expected in the lower-performance state.
What are the emerging multicore trends for the aforementioned markets? How does ARM plan to keep pace with these trends?
GOODACRE: Each market has its own specific requirement when it comes to answering its multicore demands of peak performance, power efficiency, real-time nature, and embedded characteristics. ARM has structured its R&D to address these multiple markets while extending their capabilities into new markets, both with higher peak performance designs such as through the 64-bit capable Cortex-A57, but also in the efficiency requirements of the most embedded devices through the Cortex-M0+. ARM’s road map contains devices that support each of these market’s trends, and will continue to do so. The trend for performance in enterprise scale out drives need into the interconnect products such as CCI400 and CCN504 in utilizing the most power-efficient processor for the required single thread performance level.
Today, again, mobile devices are at 4 processors and moving to 6 or 8 utilizing big.LITTLE. Auto infotainment will follow this through its commonality with the Cortex Application profile. The embedded markets primarily still dedicate a specific processor to a specific task, so in smart meters it’s more likely to still use a single processor, or potentially two: one for regular data capture and one for bursting the data across a network. The trend across many embedded markets is to make the intelligence of the system visible, whether a washing machine hosting an LCD or a meter emailing you the cost of your energy. This all leads to an increased use of processors, both individually and in multicore designs.