Advances in ARM: What it could mean to the future of computing

Sridhar G Kumar
The Startup
Published in
12 min readJun 11, 2019

In recent times, with the advent of devices with powerful computing capabilities in the palm of our hand, one particular term of note that we hear a lot is ARM. The very brain of these computational powerhouse devices are based on this and before one can truly discuss the benefits that it brings to future computational devices, let us have a look at what exactly it is, and how it varies from other alternate forms of computational processors in use today.

ARM Processor

ARM, previously known as Advanced RISC Machine, is a family of RISC (reduced instruction set computing) architectures for computer processors, configured for various environments. Arm Holdings develops the architecture and licenses it to other companies like Apple, Qualcomm, etc., who design their own products that implement one of those architectures‍ — ‌including systems-on-chips (SoC) and systems-on-modules (SoM) that incorporate memory, interfaces, radios, etc. It also designs cores that implement this instruction set and licenses these designs to a number of companies that incorporate those core designs into their own products. These products are then incorporated into devices in tandem with other components to form the end user device that we purchase as consumers.

ARM processors that have a RISC architecture typically require fewer transistors than those with a complex instruction set computing (CISC) architecture (such as the x86 processors from manufacturers such as Intel, AMD, etc. found in most personal computers), which improves cost, power consumption, and heat dissipation. These characteristics are desirable for light, portable, battery-powered devices‍ — ‌including smartphones and tablet computers, and other embedded systems. Even for supercomputers, which consume large amounts of electricity, ARM could also be a viable power-efficient solution.

RISC vs. CISC

RISC and CISC are both widely used in computational devices in todays world. A more in-depth look at each of them is required in order to truly understand which of them is more suitable to our computational needs. Generally speaking, RISC is perceived by many as an improvement over CISC. This is due to the fact that CISC was the original ISA (instruction set architecture), where as, RISC was a redesigned ISA that emerged in the early 1980s.

There is no best architecture since different architectures can simply be better in some scenarios but less ideal in others. RISC-based machines execute one instruction per clock cycle. CISC machines can have special instructions as well as instructions that take more than one cycle to execute. This means that the same instruction executed on a CISC architecture might take several instructions to execute on a RISC machine. The RISC architecture will need more working (RAM) memory than CISC to hold values as it loads each instruction, acts upon it, then loads the next one.

The CISC architecture can execute one, albeit more complex instruction, that does the same operations, all at once, directly upon memory. Thus, RISC architecture requires more RAM but always executes one instruction per clock cycle for predictable processing, which is good for pipelining. One of the major differences between RISC and CISC is that RISC emphasises efficiency in cycles per instruction and CISC emphasises efficiency in instructions per program. A fast processor is dependent upon how much time it takes to execute each clock cycle, how many cycles it takes to execute instructions, and the number of instructions there are in each program. RISC has an emphasis on larger program code sizes (due to a smaller instruction set, so multiple steps done in succession may equate to one step in CISC). This can be better visualized with the aid of the following performance equation which is commonly used for expressing a computer’s performance ability:

Performance calculation

The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program.

The RISC ISA emphasises software over hardware. The RISC instruction set requires one to write more efficient software (e.g., compilers or code) with fewer instructions. CISC ISAs use more transistors in the hardware to implement more instructions and more complex instructions as well.
RISC needs more RAM, whereas CISC has an emphasis on smaller code size and uses less RAM overall than RISC. Many microprocessors today hold a mix of RISC- and CISC-like attributes, however, such as a CISC-like ISA that treats instructions as if they are a string of RISC-type instructions.

Advantages of ARM and its implementation

In a nutshell, the ARM architecture, based on RISC, doesn’t need to carry a lot of the baggage that CISC processors include to perform their complex instructions. Although companies like Intel have invested heavily in the design of their processors so that today they include advanced superscalar instruction pipelines, all that logic means more transistors on the chip, more transistors means more energy usage. The performance of a high end Intel chip is excellent, however, a high-end processor has a maximum TDP (Thermal Design Power) of 130 watts. The highest performance ARM-based mobile chip consumes less than four watts, oftentimes much less.

This low power consumption is why ARM is so special, it doesn’t try to create 130W processors, not even 60W or 20W. The company is only interested in designing low-power processors. Over the years, ARM has increased the performance of its processors by improving the micro-architecture design, but the target power budget has remained basically the same. In very general terms, you can breakdown the TDP of an ARM SoC (System on a Chip, which includes the CPU, the GPU and the MMU, etc.) as follows: two watts max budget for the multi-core CPU cluster, two watts for the GPU and maybe 0.5 watts for the MMU and the rest of the SoC. If the CPU is a multi-core design, then each core will likely use between 600 to 750 milliwatts.

These are all very generalized numbers because each design that ARM has produced has different characteristics. ARM’s first Cortex-A processor was the Cortex-A8. It only worked in single-core configurations, but it is still a popular design and can be found in a few devices. Next came the Cortex-A9 processor, which brought speed improvements and the ability for dual-core and quad-core configurations. Then came the Cortex-A5 core, which was actually slower (per core) than the Cortex-A8 and A9 but used less power and was cheaper to make. It was specifically designed for low-end multi-core applications like entry-level smartphones.

At the other end of the performance scale, came the Cortex-A15 processor, it is ARM’s fastest 32-bit design. It was almost twice as fast as the Cortex-A9 processor but all that extra performance also meant it used a bit more power. In the race to achieving clock rates of 2Ghz and beyond many of ARM’s partners pushed the Cortex-A15 core design to its limits. As a result, the Cortex-A15 processor does have a bit of a reputation as being a battery killer. But, this is probably a little unfair. However to compensate for the Cortex-A15 processor’s higher power budget, ARM released the Cortex-A7 core and the big.LITTLE architecture.

The Cortex-A7 processor is slower than the Cortex-A9 processor but faster than the Cortex-A5 processor. However, it has a power budget akin to its low-end brothers. The Cortex-A7 core when combined with the Cortex-A15 in a big.LITTLE configuration allows a SoC to use the low-power Cortex-A7 core when it is performing simple tasks and switch to the Cortex-A15 core when some heavy lifting is needed. The result is a design, which conserves battery but yet offers peak performance. A simple illustration of this configuration can be seen in the image below.

big.LITTLE Architecture

ARM also has 64-bit processor designs. The Cortex-A53 is ARM’s power-saving 64-bit design. It won’t have record breaking performance, however it is ARM’s most efficient application processor ever. It is also the world’s smallest 64-bit processor. Its bigger brother, the Cortex-A57, is a different beast. It is ARM’s most advanced design and has the highest single-thread performance of all of ARM’s Cortex processors. ARM’s partners will likely be releasing chips based on just the A53, just the A57, and using the two in a big.LITTLE combination.

One way ARM has managed this migration from 32-bit to 64-bit is that the processor has different modes, a 32-bit mode and a 64-bit mode. The processor can switch between these two modes on the fly, running 32-bit code when necessary and 64-bit code when necessary. This means that the silicon which decodes and starts to execute the 64-bit code is separate (although there is reuse to save area) from the 32-bit silicon. This means the 64-bit logic is isolated, clean and relatively simple. The 64-bit logic doesn’t need to try and understand 32-bit code and work out what is the best thing to do it each situation. That would require a more complex instruction decoder. Greater complexity in these areas generally means more energy is needed.

A very important aspect of ARM’s 64-bit processors is that they don’t use more power than their 32-bit counterparts. ARM has managed to go from 32-bit to 64-bit and yet stay within its self-imposed energy budget. In some scenarios the new range of 64-bit processors will actually be more energy efficient than previous generation 32-bit ARM processors. This is mainly due to the increase in the internal data width (from 32- to 64-bits) and the addition of extra internal registers in the ARMv8 architecture. The fact that a 64-bit core can perform certain tasks quicker means it can power-down quicker and hence save battery life.

The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the “big” cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the “LITTLE” cores. This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series and Apple mobile application processors starting with the Apple A11.

This is where the software also plays a part. big.LITTLE processing technology relies on the operating system understanding that it is a heterogeneous processor. This means the OS needs to understand that some cores are slower than others. This generally hasn’t been the case with processor designs until now. If the OS wanted a task to be performed, it would just farm it out to any core, as they all had the same level of performance. That isn’t so with big.LITTLE which uses a specific kernel scheduler which understands the heterogeneous nature of big.LITTLE processor configurations and which will decide where each process/thread is executed. In the future, this scheduler could be further optimized to take into account things like the current running temperature of a core or the operating voltages.

ARM in Traditional Computing

In spite of the overwhelming advantage of ARM in mobile devices, most laptops and computers , namely devices which are essential to our workflow use the CISC based processors. But recently we have been seeing a change in this trend and a more well received influx of ARM based processors for PCs. At the end of 2017, Qualcomm and Microsoft announced the first Windows 10 devices with ARM-based processors. HP, Asus, and Lenovo all launched laptops and convertibles with Qualcomm’s Snapdragon 835 processor in them. Windows 10 on ARM was a reboot of Microsoft’s earlier attempts to marry mobile processors with full laptop experiences. It promises to provide better power efficiencies, reliable performance, and always-on connectivity compared to the x86 Intel-based computers that have existed so far. For these ARM devices Qualcomm boasts battery life of up to 25 hours, plus instant power on and performance that is on par with an Intel computer. It also says that the built-in LTE connectivity will provide significantly faster speeds than other available LTE-equipped Windows 10 computers. Furthermore, Microsoft has made continuous improvements to Windows support for ARM chips over the past few years, and its upcoming architecture-agnostic Lite OS is more proof that the company is serious about having more ARM-powered laptops on the market.

Recently, Intel officials and developers have reportedly told Axios that Apple is preparing to launch ARM-based Macbooks next year. The report follows a story from Bloomberg saying that Apple plans on combining iOS and macOS applications by 2021. Rumours have existed for a few years that Apple would be switching its MacBook laptops to its own ARM processors. However, previously, ARM chips did not have quite the necessary performance to run more full fledged desktop applications. This report from Bloomberg reiterated Macs running on ARM may arrive in 2020. Axios’ report seemingly confirmed the claim, citing “developers and Intel officials”. Apart from these reports, however, with the release of the iPad Pro in 2015, Apple showed that its ARM chips could now handle “PC-class” applications. Since 2015, Apple’s chips have become ever more powerful, increasing their performance in much larger steps with each generation compared to Intel’s CPU generations. Apple has traditionally preferred having more control over the core components of its devices, if it could afford it, so it makes sense that Apple would eventually want MacBooks to be powered by the same (or upgraded) chips powering iOS devices.

The final piece of the puzzle would be transitioning x86 macOS programs to the ARM instruction set architecture. Since last year we had been hearing that Apple is working on a project called “Marzipan” that would allow developers to code their app once and have it work on both iOS devices and macOS computers. Apple announced the release of the first version of the necessary software kit a few days back at its annual developer conference. At first, Apple will allow developers to port only iPad apps to Macs, because iPad apps are closer to macOS apps in both functionality and user experience. Initially, the developers still have to submit two different versions of their apps that have optimized user interface for each platform, but the underlying code will remain the same.

In 2020, Apple’s Marzipan software kit is also expected to allow developers to port their iPhone apps to Macs. Apple engineers have found it challenging to port applications designed for a small screen to the desktop, which is why it will take longer to make this transition. By 2021, third-party app developers will be able to create a “single binary” that will work across iOS devices and macOS computers. Presumably, they’d still have different user interfaces on each form factor, but they’ll either be more fluid to adapt to the screen size or developers will have to contain different user interfaces within each binary.

ARM and its partners also made big announcements for the server market, which they intend to target with the significantly more powerful Neoverse N1 and other variations of that chip. Amazon, the largest public cloud services provider has even started to design its own ARM CPU, which will likely also be upgraded to an N1-based processor soon, too. Even Google, which has historically lacked ARM support in Chrome OS despite the OS being architecture-agnostic since day one, seems to be working on bringing the Snapdragon platform to some Chromebooks which should in turn enable better functionality and usability of native android apps on Chromebooks. However, only Snapdragon 845 will be supported initially, as the company wants to bring cheaper Chromebooks to market. Another issue seems to be that Qualcomm would rather put the Snapdragon 8cx in Chromebooks that cost $500 or more, likely because that would mean the OEMs would be able to afford to pay a higher price for the 8cx. This could also lead to the availability of high end Chromebooks on the market.

In conclusion, with the advances in ARM architecture and the improved 7nm lithography process in the semiconductor industry in conjunction with leading manufacturers investing in the development of ARM based devices, we can expect a new and exciting range of products to hit the market. With this expected direction of development, we as consumers, can only hope that our computational devices in the future would provide us with excellent performance and high efficiency and allow us to invest our time in what we are best at: creativity and innovation.

--

--