Apple A12 Bionic processor running ML algorithms 9 times faster — a myth or reality?
To implement Zest the Reality Media project, our team needs to investigate the performance of mobile systems for processing neural networks. We need to know a modern smartphone's potential. This is vital for our project. We have conducted extensive research on this issue.
In today’s mobile processors market, we are seeing an increasingly popular trend of adding artificial intelligence (AI) engines to processor cores. The first one — the Kirin processor — was introduced by Huawei in its flagship processor line. Soon after, the market responded with Qualcomm releasing SnapDragon and Samsung rolling out its Exynos processors. And, for the second year in a row, Apple has been featuring a coprocessor for work with artificial intelligence — the Neural Engine.
Since Apple claims that the newly released A12 processor makes ML algorithms run up to 9 times faster, I wanted to see if that was truly the case. I was also wondering what the needs were in this module or whether it was another advertising move. I set out to investigate.
Right off the bat, only two apps are explicitly set up to work with artificial intelligence:
- The first is a camera app with ability to post-process the images “exceeding your expectations” and also classified by category for later search.
- The other is a voice recognition engine as Siry or Bixby. Those apps received an ability to operate without internet connection just recently.
The above list probably exhausts the functional needs of a smartphone. The rest is left to third-party developers to find the scope of application. However, both voice assistants and image processing cope with SINGLE processing requests on the native CPU quite decently. There is no need for real-time processing speed in modern-day smartphone use cases. We put a slight pressure on the system when searching for faces while taking pictures in order to focus our cameras in real time. But even that pressure is relatively minor. So processor manufacturers tend to exaggerate the machine learning value in smartphones, and at the moment, the neural network engine is more of a marketing feature than a real need.
However, there are few new projects appearing on the market that use artificial neural networks (ANN) on mobile phones. For example, offline text recognition (OCR) systems, image processing systems (such as Prism), and completely new applications — such as the Zest the Reality Media application for object recognition for use in social networks.
The Zest the Reality Media team is developing a completely new augmented reality social networking application— where objects are recognized by a neural network. At the same time, the application is not only focused on the object recognition function; there is also business logic performed as well as data aggregation, camera processing, searching as well as working with various AR elements.
All this makes it clear that modern applications will not focus exclusively on artificial intelligence but will also integrate AI into your business logic. Therefore, one needs to comprehend the Neural Engine speed in two ways: as a separate engine, and when applied to a certain business logic.
In 2017, Apple presented its first A11 Bionic processor with a built-in Neural Engine coprocessor.
Apple officially stated that the A11 processor can perform 600 thousand operations. How can one understand whether this is a lot or not? Cupertino’s claim is that A12 performs 5 trillion transactions, which is 9 times faster than A11. But faster for what purpose exactly?
The new neural coprocessor emergence raises many questions that I wanted to get answers to. I set out to investigate whether the new coprocessor in the field of artificial intelligence gives me a real advantage. I checked the CPU load on a real production application, not in synthetic tests. This allowed me to understand the real performance gain. Here is what I’ve come up with so far.
To begin to define the hardware, I use 4 different device generations for testing.
First of all, I use iPhone 6s as the minimal-feature smartphone that supports CoreML. Other devices are iPhone 7 Plus, iPhone X, iPhone Xs. The last two models are the most interesting for us because their processors — A11 and A12 — contain a special coprocessor for computing an artificial neural network simulating artificial intelligence.
List of devices:
- iPhone 6s (2015): processor A9 (2 cores) with 2Gb RAM, 16nm
- iPhone 7 Plus (2016): processor A10 Fusion (2+2 cores) with 3Gb RAM, 16nm
- iPhone X (2017): A11 Bionic (2+4 cores + Neural Engine) with 3Gb RAM, 10 nm
- iPhone XS (2018): A12 Bionic (2+4 cores + Neural Engine) with 4Gb RAM, 7 nm
For the next comparison, I evaluated them against the parameter called “processor speed increases”. The values below are obtained from official Apple sources (presentations).
For the second comparison, I chose the algorithmic part to get my measurements. Since the application which I conduct measurements on works with image processing, I used the available artificial intelligence networks that work with images. For example, I used mobilenet_v2_1.0_160.
For the third comparison, I chose the software part — the framework for obtaining measurements. In addition to CoreML® the developers are actively using TensorFlow™, so I had to explore two frameworks which allowed me to compare how to effectively use the Neural Engine coprocessor.
How I measured the results.
Xcode development tools lets you viewing CPU and GPU usage. Real performance can be measured only by real tests. The net performance should be distinguished from the applied one. Applied performance means that the artificial neural network works in a certain environment performing the same payload where the work with the artificial neural network is not continuous.
Quite indicative for this is the Zest the Reality Media application, where in addition to working with the artificial neural network, there are also other jobs it performs: processing the frame from the camera, working with the API on the server side, working with computer vision and internal business tasks, getting location by GPS sensor, using gyroscope, etc. In this case, the artificial neural network cannot allocate 100% of the CPU resources, and usually, this is not even necessary.
One of the performance measurement metrics is the CPU resources which are required for an artificial neural network to work. However, not all the tasks require the artificial neural network to work in real time. In some cases, it is possible to sacrifice the speed of artificial intelligence for the sake of the device’s battery power.
Having taken into account all of the above conditions, our team has developed a test inside the Zest the Reality Media application that will allow you to run an artificial neural network on different frameworks. In addition, I can choose one of the two modes it runs: full load in real time mode and the power saving mode when the use of the artificial neural network is minimal.
Measurements can be made using two standard tools from Apple.
- We can use the built-in Xcode Debug Navigator. Here some screenshots with my findings.
2. We can also use some additional tools.
- Instruments, Energy Log
- Core Animation
- CPU Activity Log
For a more accurate energy utilization measurement, we used wireless device profiling (as recommended by Apple). It eliminates the impending variable of the phone being connected to the computer and charging.
I performed testing of the Zest the Reality Media App in 4 modes:
- Mode 1 — usage TensorFlow framework, with optimization.
- Mode 2 — usage TensorFlow framework, without saving resources.
- Mode 3 — usage CoreML framework with processing on the GPU, with optimization.
- Mode 4 — usage CoreML framework with processing on the GPU, without saving resources.
What questions did I want to be answered:
- How justified is the Neural Engine use instead of the CPU use?
- How much faster do the programs work running CPU A12 Bionic processor for tasks containing calculation with artificial neural networks? And how do they fare when compared with the performance speed of Bionic A11 and Bionic A10?
- Does using the native CoreML framework from Cupertino give you a performance boost in comparison with TensorFlow?
- How much similarity is there in the way the artificial neural network works at full load versus optimized load (~5 fps)?
Next, I analyzed the measurement results. The processing time of a single frame in the Zest the Reality Media application is shown on the following picture.
From the above, I have arrived in some interesting conclusions:
Framework CoreML is optimized to use GPU and the Neural Engine faster than Framework TensorFlow running on a clean CPU. For processors, without the Neural Engine, this difference reaches 2.5–3 times. The Neural Engine implementation in the A11 processor has yielded virtually no increase in speed. According to my findings, CoreML is still trying to use GPU on the A11 processor. However, in the A12 processor, the claimed increase was confirmed: it is indeed 9 times faster (calculated data 8.5537, which within the experimental error can be considered as confirmation that A12 works 9 times faster than A11). In the test for power consumption on the GPU, the obtained result shows up as 0. To conclude, the A12 Neural Engine gives a real advantage with data processing for ANN.
- Measure the energy impact of an iOS device — Apple
- TensorFlow MobileNet model on GitHub
- Machine Learning in iOS: Azure Custom Vision and CoreML (Part 2) — Khoa Pham
- Integrating TensorFlow Model in an iOS App — Mohammad Azam
- Apple’s Core ML 2 vs. Google’s ML Kit: What’s the difference? — Kyle Wiggers
- Machine Learning on Mobile: Core ML and TensorflowLite Frameworks Comparison. — Mateusz Opala
- Zest the Reality Media application for test ANN — inventor Pavel Dyakov
If you liked this article, please click the 👏 button (once, twice or more).
Share to help others find it!