The performance of Rosetta 2 and Windows on ARM tranlators/emulators

There has been much praise of the recently introduced Apple M1 Macs. Part of that is obviously due to the performance of the Apple M1 SoC. However, the performance of Rosetta 2 also seems to be very good since non-native, Intel-coded apps reportedly perform very well, sometimes exceeding their performance when run natively on Intel Macs.

On the other hand, Microsoft also has an emulator that allows you to run 32-bit x86 code on top of their Windows on ARM operating system. Reviewers were very disappointed by the performance of this emulator, however. Unfortunately, it is difficult to discern whether the dismal impressions were due to the emulator itself, or to the lacklustre performance of the hardware powered by SoCs that are far slower than the Apple M1.

Here, I would like to look at some benchmarks that have been posted on the web, and try to understand the performance of these two emulators/translators independent of hardware variances. I want to know how well they perform from a pure software perspective, after the differences in hardware have been factored in and controlled for. 

Bootcamp will not be coming to the Apple M1 Macs but Apple demoed a development version of Parallels during the M1 announcement, showing that operating systems coded for the ARM architecture can run on M1 Macs through a virtual machine. This suggests that the best path to run Windows on an M1 Mac will be to run Windows for ARM on Parallels (although Microsoft currently imposes licensing restrictions for this). Since the vast majority of Windows apps (including most of Microsoft’s own apps) have not been rewritten for ARM, they would be relying heavily on Microsofts’ translator/emulator. The current discussion should give us an idea of the performance that we can expect under this scenario.

Rosetta 2

  1. Geekbench shows that Rosetta 2 running emulated x86 code achieves 78-79% of the performance of native Apple Silicon code.
  2. Arstechnica ran browser benchmarks comparing native Chrome and Rosetta 2 emulated Chrome. Speedometer 2.0: 116 vs 210 (55%), Jetstream2: 93.1 vs 156.9 (59%), Motion Mark: 435.7 vs 726.4(60%) (x86 vs Native) 

There is also a very interesting and technical discussion of what Rosetta 2 might actually be doing to achieve this performance.

Windows on ARM

  1. Geekbench 4 single-core on a Surface Pro X scores 2293 in x86 emulation mode and 3643 when run natively. In multi-core tests, the results were 7215 and 12370 respectively. Here emulation is achieving 63% and 58% of native performance.
  2. A Google Octane browser test on the HP Envy x2 was performed on the Edge browser (native ARM code) and Chrome (emulated Intel code). Chrome scored 3500 while Edge was 10712. Since Chrome is typically the same or faster than Edge when run natively on Intel, you can estimate that emulation achieves 35% or lower performance on Windows for ARM.
  3. A game developer wrote his own benchmark code to test the performance of the emulator. He concluded that emulated performance is 3x to 8x slower than native code.

There is some documentation on the Windows for ARM x86 emulation layer here.

QEMU

QEMU is open source software that provides a virtual machine, with or without binary code emulation. Alexander Graf tweeted his success on porting this to Apple M1 and running Windows ARM64 Insider Preview on it through Hypervisor.framework. Although it is highly unlikely that this open source project will provide a mainstream approach to running Windows on M1 Macs, this is similar to the approach that was demoed for Parallels, and what Apple executives have been mentioning. Alexander provided a comment on the performance of x86 emulation in this setup.

Windows 10 ARM64 can run x86 applications pretty well. It’s not as fast as Rosetta2, but close.

Conclusion

It is very difficult to draw conclusions since we do not have a suite of direct comparisons with the same benchmarks. The ratios between native and emulated code vary widely depending on whichever test we use, and it is impossible, given the current data, to give a qualitative assessment. This is very unfortunate since even low double-digit gains in performance do not come easily. In particular, we cannot evaluate the claims made by Apple with regard to how Rosetta 2 translates ahead of time to further optimise and how this improves efficiency. Nonetheless, it does seem that Rosetta 2 can consistently emulate x86 code with less performance penalty compared to Windows for ARM, but although the difference may be significant, it does not look like we will be looking at more than a 3x difference in general.

As announced, Windows for ARM will soon be gaining the capability (at last) to emulate 64-bit Intel x86 code. We might also see additional optimisations bringing performance closer to Rosetta 2. The introduction of this feature would be good timing for Microsoft to also relax licensing restrictions for Windows for ARM, allowing its mainstream use on M1 Macs. Parallels does seem very positive about the situation and I think we should be too.

Update

This post has been updated and edited significantly since the original publishing date of 23-Nov, especially with regard to the tweet by Alexander Graf.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: