2017 embedded processor report: At the edge of Moore’s Law and IoT


With the benefits of Moore’s Law waning and the () targeting an untold number of lower end devices, embedded processor vendors are now tailoring solutions to the specific needs of end customers and applications more than ever before. The result? An emphasis on power efficiency, security, , and cost.

Forces at work in the electronics industry have reshaped the embedded processor landscape in recent years, among them, the slowing of Moore’s Law and the realization that most IoT devices will emphasize price and power rather than feeds and speeds.

To be sure, embedded processors have traditionally been slow to adopt advanced process nodes, as longer lifecycles and the applications in question typically haven’t required top-of-the-line performance. However, with the cost of developing chips that push the limits of semiconductor lithography on the rise, Jag Bolaria, Principal Analyst for Embedded and Servers at The Linley Group, expects that power efficiency and consumption will continue to displace performance as the key driver of embedded and IoT solutions.

“There is enough left in Moore’s Law to take us out to 2022, but the rate of new process technology will slow down and the costs for development will increase,” Bolaria says. “Specifically, the next nodes to come out are 10 nm in 2017, 7 nm around 2019, and 5 nm around 2022. The latter, however, is likely to need new fabrication technologies such as extreme ultraviolet (EUV) lithography, and at 5 nm, vendors may look at using exotic materials.

“The IoT will focus on low cost, low power, and power efficiency,” Bolaria continues. “There is enough performance in existing products to satisfy most IoT applications. Most embedded applications do not need leading-edge performance, but they do want the best power efficiency and the lowest power consumption. Embedded products from leading vendor NXP do not offer leading performance, but are instead right sized – in terms of absolute power, level of integration, and power efficiency – for the target application.”

Bolaria’s observations serve as a proof point for the continued success of the 8-bit () market, where despite years of speculation to the contrary, vendors such as Microchip have seen sustained success in their PIC and (recently acquired) AVR product lines. But as opposed to innovations in the CPU itself, the endurance of these 8-bit technologies can largely be attributed to intelligent hardware blocks like core-independent peripherals (CIPs) that “allow designers to implement the most time-sensitive parts of their application in fixed-function, low-power, and always-on hardware,” says Greg Robinson, Senior Director of 8-bit MCU Marketing at Microchip Technology Inc. (Figure 1).

“CIPs have the ability to communicate directly with other peripherals to create configurable hardware blocks,” says Robinson. “These “core-independent” blocks consume very little power and are much smaller than the RAM and Flash needed to implement the same function within the core. They also provide a quicker and more reliable response than a software-driven routine, which provides greater performance. CIPs require little to no code, eliminating the time and cost for validation and ultimately resulting in faster time to market.

[Figure 1 | The 8-bit PIC16F18857 MCU features Core-Independent Peripherals (CIPs), up to 56 KB Flash memory, and a 10-bit -to-digital converter () for a variety of general-purpose and low-power applications.]

“This is a great alternative to the practice of utilizing the CPU’s sleep mode to lower power consumption when it’s not needed and then wake it up with a hardware interrupt in the event that it’s needed again, which brings in potential issues with variable startup time and interrupt latency,” Robinson adds.

The effects of this trend can also be seen at the home of Moore’s Law, Intel, where delivering features tailored to embedded systems is beginning to supersede Dhrystone Million Instructions Per Second (DMIPS) targets. While Jonathan Luse, General Manager of IoT Planning and Product Line Management at Intel asserts that the recently released Intel Atom Processor E3900 series based on the 14 nm Goldmont microarchitecture does provide a 70 percent improvement in CPU performance and a 2.9x increase in graphics capability over previous generations in the same 6-10 W power envelope, he also acknowledges that although the semiconductor giant is still “expected to deliver an increase in CPU performance per watt generation over generation or a graphics improvement generation over generation at the same power threshold,” customers are continually “asking for more than that nowadays.”

This is evident in perhaps the most significant enhancement of the Atom Processor E3900, Intel Time Coordinated Computing Technology (Intel TCC Technology), a switched fabric with global time awareness that connects internal (DSPs), image signal processors (ISPs), and sensor subsystems, and can be extended to time-sensitive peripheral devices via PCI Express (PCIe). “All about deterministic control loops or systems that require real-time computing down to the microsecond level,” technologies like TCC are an example of the chipmaker’s concerted effort to address application-specific needs in the key segments of automotive, industrial, and video (Figure 2).

[Figure 2 | Intel Time Coordinated Computing Technology (Intel TCC Technology) in the Atom Processor E3900 series allows the processor and multiple interfaced devices to function in a time-coordinated fashion with microsecond latencies.]

“With Intel’s approach from the edge to and the fog network in between, we have to think about it from a vertical solution first and work our way back to the technologies needed to solve the business problems that we’ve got,” Luse says.

Security, tools trend upwards

Like power efficiency, security is another aspect of embedded processing that is moving to the forefront of design requirements. While processor-based security has historically been seen as prohibitive due to the cost and development effort associated with implementation, “’s introduction of TrustZone into Cortex-M-class processors over time is going to make the [security] hardware essentially not be a cost factor,” says Steve Hoffenberg, Director of the IoT and Embedded Technology Practice at VDC Research.

“ARM TrustZone in particular has been around a long time, but its usage is growing considerably,” says Hoffenberg. “The hardware has been there, but in the majority of devices that have used those processors, [developers] haven’t actually used the security features of the hardware.

“With TrustZone now in Cortex-M, there will be little to no incremental hardware cost,” Hoffenberg continues. “There will be more development time, which will be non-recurring engineering (NRE) on the software side to actually utilize the hardware, and there’s is going to be an initial hurdle for a lot of engineers who start using that hardware in their lower end devices. But once they do, it should be relatively little, even incremental software development time, over the long run.”

Development tools have also evolved to facilitate the utilization of security capabilities inherent in modern processors, with current versions of the ARM Keil Microcontroller Development Kit (MDK) and Cortex Microcontroller Software Interface Standard (CMSIS) both supporting TrustZone-enabled . Per Reinhard Keil, Senior Director of MCU Tools at ARM and Founder of Keil Elektronik GmbH, as “more and more devices include pre-programmed firmware in ROM” these tools will accelerate the development curve for “future Cortex-M23 and -M33 applications split into secure and non-secure parts.”

“MDK and CMSIS version 5 already include full support for TrustZone-enabled Cortex-M devices,” Keil says. “Several program examples and a full-featured real-time operating system (RTOS) help you get started quickly, and we will deploy more template applications that exemplify the setup of secure systems. Consistent APIs are fundamental for such software environments, and ARM enables this with CMSIS and the application binary interface (ABI) procedure call standard for compilers.”

Keil also notes that Fixed (FVPs) available in the MDK-Professional Edition can be leveraged to simulate complete Cortex processor systems, considering memory and peripherals in addition to the core itself. An for the Cortex-M33 provides simulation speeds “similar to a Cortex-M running at 100 MHz,” Keil says, permitting engineers to “develop and validate software for next-generation MCUs today.”

Elsewhere in the tools market, MCU suppliers themselves have started to update their offerings in anticipation of application development based on ARMv8-M architectures equipped with TrustZone technology, and here the Renesas Synergy Platform is an indicator. According to Semir Haddad, Director of Marketing for MCU and MPU Product Solutions at Renesas Electronics America, Synergy will support “the Cortex-M22 and Cortex-M33 in a future generation with integrated software and tools that abstract many low-level details from the developer for ease of use and fast time to market.”

“One of the focus areas for our next-generation Synergy Platform will be to further reduce complexity for developers creating secure connected devices with advanced security technologies,” says Haddad. “The use of ARM TrustZone for Cortex-M as a hardware capability will be integrated with the other elements of the Synergy Platform, including the security scheme, offering developers improved ease of use and leading to a higher adoption rate of TrustZone features in ARMv8-M-based systems.

“Because of the rate of innovation coming with the IoT, development tools need to have the capacity to evolve quickly,” he adds.

Programmability and achieving price synergies in processor development

As always, development cost remains a consistent theme for consumers of embedded processing technology, but given the impending limitations of Moore’s Law, it has recently become more of an issue for suppliers as well. As processing solutions based on advanced geometries take longer to design and the photolithographic masks used to develop such wafers approach $5 million each at 16 nm fabrication nodes, solutions have become more and more attractive to semiconductor suppliers. This is not only for the technological benefits they bring to the table, but also the cost synergies they provide, says Dan Mandell, Senior Analyst in the IoT and Embedded Technology Practice at .

system on chips () from an Altera or Xilinx, like the Zynq, in combination with ever smaller, cheaper FPGAs are going to play a much bigger role in the next couple years in attaining the workload acceleration needed at the edge or device level [of the IoT],” Mandell says. “For a lot of industrial workloads, to be able to do system processing in a cost-effective manner is going to demand more optimization through soft cores. They’re used a lot for glue logic right now for just simple integration with legacy assets, but as the supporting software development resources, toolsets, and design environments become easier to use and more people become accustomed to them, there will be more opportunities for this type of technology to take root amongst original equipment manufacturers (OEMs) and integrators.”

As Mandell mentions, what semiconductor vendors are seeking to gain now and in the future from the integration of programmable logic blocks in their processor designs is the ability grow with changing interface standards and evolve over time to serve a broader range of customer requirements. However, while soft programmable logic IP does add dynamic characteristics as opposed to the traditional approach of completely hardwired chips, one drawback of the soft IP approach is its lack of density. In response, embedded FPGAs in the form of hard programmable logic blocks have gained traction as a way to meet the aforementioned challenges, enabling chipmakers to “reconfigure register-transfer level (RTL) blocks to allow certain critical portions of the chip to keep up with changes,” says Geoff Tate, CEO of Flex Logix Technologies, Inc.

“That’s what an embedded FPGA does – execute RTL,” Tate says. “With an embedded FPGA in a chip you can reconfigure critical RTL at any time, even in the field, so if an industry standard changes, you can send out an update; if customers want their specs changed, you can address it with a change to the RTL; even in the MCU space, people can use it to customize chips, so one chip can look like a dozen different mask sets. An example [of the latter] is serial I/Os in MCUs. It’s very common that they’ve got dozens of versions of chips where the differences are just how many I2Cs and how many SPIs and how many UARTs they have. But at 40 nm, that’s $1 million a mask, so using an embedded logic block can save a lot of money and get to market more quickly.”

The core technology of Flex Logix is EFLX, an embeddable FPGA technology that can be configured in blocks as small as 100 look-up tables (LUTs) up to array sizes of 122,000 LUTs comprised of either straight logic blocks or a combination of logic and blocks. The flexibility of this building block approach to embedding FPGA IP enables roughly 75 different array shapes and sizes, and to-date the technology has enabled on-chip functions such as bus acceleration or algorithm computation, programmable packet parsing, and configurable I/O in densities similar to that of Xilinx FPGAs. What sets the technology apart, though, is that it “can address everything from $1 MCUs to gigantic network switch chips with a digital architecture that we port to various process technologies,” Tate says (Figure 1).

[Figure 3 | Flex Logix EFLX embedded FPGA technology was ported to Taiwan Semiconductor Manufacturing Company’s (TSMC’s) 16 nm FinFET Plus and FinFET Compact (FFC) processes in late 2016 for networking and applications; supports 28 nm and 40 nm nodes for , solid-state drive (SSD), and MCU designs; and supports processes as far back as 180 nm. The EFLX compiler software tool integrates with Synopsys synthesis tools to allow developers to move from verilog or VHDL to the timing generation and bit files for programming the EFLX arrays in their chip designs.]

“We have a tremendous value proposition on the advanced nodes, and it’s very logical that embedded FPGA makes sense there,” Tate explains. “But on the other hand, it’s easier to get people over the hump to try something new on the older process nodes because they have more designs, they’re less expensive, and they can afford to take risks more easily.

“As it turns out, originally we thought that there was no market for this technology at older process nodes, but customers are saying that even when mask costs are very low, there are tremendously significant costs in large companies for the whole design and qualification of new chips,” he continues. “So the mask cost is just the tip of the iceberg. All of the other work they have to go through – testing and test programs and quality and validation and stocking and inventory control – all of these things are significant costs, and the bigger companies tend to have more of them because they have more checks and balances and systems in place. We have customers asking us to go to 65 nm, 90 nm, and 180 nm, so there’s a big market in the advanced nodes, but there’s a big market as well in the 40 nm to 180 nm nodes.”

M&A: Causes and consequences

The most undeniable trend in the processor industry over the last 18 months has been the appetite for acquisition, which Bolaria attributes to the fact that “process technology is no longer bringing the same benefits in lower silicon and transistor costs” on the one hand, and “cheap money” on the other.

“If a company can get loans at 3 percent and they buy a company that has 9 percent margins, the acquiring company is making 6 percent for doing nothing,” explains Bolaria. “Then they can further increase the margin by cutting costs and thus make more money. Chip vendors need to make enough margin on the current generation of products to fund the next generation, and with increasing costs, they need to figure out how to increase the return.”

In addition to the factors mentioned, consolidation “in and of itself is an indication that the margins and revenue opportunity in a lot of traditional embedded processor markets that these suppliers have been chasing are starting to wane with more competition, which is stemming from across the world,” says Mandell. “The aggressive investing of the Chinese Government in trying to acquire as much semiconductor manufacturing capacity as they could over the past couple of years is just one con. The other is that it’s going to take a couple of years for the integration of these mergers to really materialize in terms of the management, the personnel, the products, and the market strategies.”

The consequences of this rash of mergers are already being felt in the competitive landscapes for MCUs and SoCs, says Mandell, who cites Intel’s adoption of ARM-based FPGA SoCs and the possible accelerated sunsetting of NXP’s (now Qualcomm’s) Power Architecture portfolio as fallout of the activity. Nevertheless, he expects the pace to persist as emerging applications continue to unfold.

“There are a lot of synergies to be had as far as the different mergers taking place between the different semiconductor suppliers,” Mandell says. “The semis will maintain their merger and acquisition appetite through the next year and continue to invest heavily in both hardware and software solutions so that they can best position themselves from a competitive standpoint for some of these burgeoning opportunities in advanced driver assistance systems (ADAS) or 5G – areas where the revenues aren’t currently but that demand a lot of up-front R&D and product development and coordination among various partners in order to be able to reap the rewards in five years’ time.”