Apple Rosetta 2's Limitations: Illegal Hardware Instruction

If you are expecting Rosetta 2 to be a magic bullet for all, you may be disappointed.

Raymond Lo, PhD
Mac O’Clock
5 min readDec 24, 2020

--

Very likely once you dig deeper into it, you’ll find out not everything can be translated. Here is a snapshot of the official documentation on the Apple Developer website.

A screenshot of the official documentation about Rosetta 2 (https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment)

To know what’s supported, I ran this command line in Terminal under Rosetta 2.

raymondlo@Raymonds-MacBook-Pro ~ % sysctl -a | grep machdep.cpu.featuresmachdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTSE64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 AES SEGLIM64

As you can see, the AVX extension is missing, and maybe more? But the real question is does it really matter?

Illegal Hardware Instruction + Cross-Linking Bugs?

If you are a developer, then here is the bad news for you if you ever optimized your application using AVX or AVX2. As I have highlighted previously, Rosetta 2 won’t be able to handle the AVX extension. However, the code is still able to compile successfully without any warnings. Yes, absolutely no warning at all until you hit the bug and terminated your application! A hard crashed.

To demonstrate this, here I pulled an example AVX sample code from this tutorial.

#include <immintrin.h>
#include <stdio.h>
int main() {/* Initialize the two argument vectors */
__m256 evens = _mm256_set_ps(2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0);
__m256 odds = _mm256_set_ps(1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0);
/* Compute the difference between the two vectors */
__m256 result = _mm256_sub_ps(evens, odds);
/* Display the elements of the result vector */
float* f = (float*)&result;
printf("%f %f %f %f %f %f %f %f\n",
f[0], f[1], f[2], f[3], f[4], f[5], f[6], f[7]);
return 0;
}

And then you compile it with Rosetta 2 enabled.

gcc -Wall -mavx -o hello_avx hello_avx.c

And it works. It compiled. However, when you try to execute this code. You will be welcomed by the famous ‘illegal hardware instruction’ error and terminated your application. Here it is the burning pain immediately for all x86–64 developers as you won’t be warned but hit by this massive bullet at runtime. This is bad… really bad indeed from the user experiences perspective.

raymondlo@Raymonds-MacBook-Pro avx % ./hello_avx
zsh: illegal hardware instruction ./hello_avx

The implication to all x86–64 developers…

This actually may be a huge issue to many ‘optimized’ applications that use vector instructions or using dynamic linking libraries. After Apple M1 is released, you will start to see these popping up from Tensorflow forums…

So far, I cannot figure out the bugs behind some of these. I believe there are many applications that would be affected by this and have not yet been discovered due to this similar issue.

Furthermore, this translation limitation also affects OpenVINO as it cross-linked Tensorflow.

/usr/local/opt/python@3.7/bin/python3.7 -- /usr/local/deployment_tools/model_optimizer/mo.py --framework=caffe --data_type=FP16 --output_dir=/Users/raymondlo/openvino_models/ir/public/squeezenet1.1/FP16 --model_name=squeezenet1.1 '--input_shape=[1,3,227,227]' --input=data '--mean_values=data[104.0,117.0,123.0]' --output=prob --input_model=/Users/raymondlo/openvino_models/models/public/squeezenet1.1/squeezenet1.1.caffemodel --input_proto=/Users/raymondlo/openvino_models/models/public/squeezenet1.1/squeezenet1.1.prototxt
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /Users/raymondlo/openvino_models/models/public/squeezenet1.1/squeezenet1.1.caffemodel
- Path for generated IR: /Users/raymondlo/openvino_models/ir/public/squeezenet1.1/FP16
- IR output name: squeezenet1.1
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: data
- Output layers: prob
- Input shapes: [1,3,227,227]
- Mean values: data[104.0,117.0,123.0]
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: None
- Reverse input channels: False
Caffe specific parameters:
- Path to Python Caffe* parser generated from caffe.proto: /usr/local/deployment_tools/model_optimizer/mo/front/caffe/proto
- Enable resnet optimization: True
- Path to the Input prototxt: /Users/raymondlo/openvino_models/models/public/squeezenet1.1/squeezenet1.1.prototxt
- Path to CustomLayersMapping.xml: Default
- Path to a mean file: Not specified
- Offsets for a mean file: Not specified
Model Optimizer version: custom_master_29f1c38ba0ae51897f47946d79d6bd6be7a494f0
zsh: illegal hardware instruction /usr/local/opt/python@3.7/bin/python3.7 -- --framework=caffe --data_type=FP1

My workaround now is to run this part of the tool on my x86–64 machines and figure out if I can use a VM or similar to workaround it overtime. Or maybe I would need to compile everything again natively in arm64. Oh man, that’s lots of work!

However, if you are not a developer, are you going to be affected by this? In short, yes as Apple M1 is also missing various hardware-accelerated components such as the hardware video encoder. You cannot just simply replace the hardware with a software translator.

Hardware Accelerated Video Encoding with VideoToolbox!

One practical example that may affect some users would be the ability to get the hardware acceleration with Handbrake or FFmpeg using Rosetta 2. For example, Handbrake provides an option called Apple VideoToolbox which enables hardware encoding. That indeed provides significant speedup over the normal CPU method on my older generation Macbook Pro.

Normal x264 option is selected and I’m getting ~25fps on my early 2015 — Macbook Pro 15 inch.
With VideoToolbox hardware acceleration enabled, I can get over ~4x speedup on transcoding the same video.

On the Apple M1, the VideoToolbox option is no longer available under Rosetta 2.

That’s one thing worth keeping in mind.

Update: The Universal Binary of Handbrake got the VideoToolbox option back on Apple M1 and it seems pretty fast.

My quick thought

Well! Rosetta 2 = bulletproof? Not at all. We shall all get ready to do the real work porting code. To me, the Rosetta 2 served as a very good bridge for some older applications which are not prioritized on performance.

The mixing of x86–64 code on an arm64 native environment however is rather problematic as I have no visibility as to which application is going to work or not, or when it will not work. It indeed feels like a time bomb if I were running a mission-critical application this way. The extensive testing must be done end-to-end very carefully. Again, be warned! Happy porting.

References

--

--

Raymond Lo, PhD
Mac O’Clock

@Intel - OpenVINO AI Software Evangelist. ex-Google, ex-Samsung, and ex-Meta (Augmented Reality) executive. Ph.D. in Computer Engineer — U of T.