Google Cloud Platform CPU Performance in the eyes of SPEC CPU® 2017 — Part 1

Federico Iezzi
Google Cloud - Community
8 min readFeb 2, 2023

SPEC CPU 2017 on EVERY Google Cloud Compute Engine machine

Well, perhaps I lied a bit, this work is not exactly on every possible Compute Engine permutations but rather on every machine family and every CPU microarchitecture available. The aim is simple:

  1. For each machine family, figuring out the strengths and weaknesses, also underlying how particularly old CPU µarch stand the test of time. This for me was the kickstarter idea of this project because online little accurate data is available comparing to over a decade of CPU µarch;
  2. Comparable datapoint for (most) x86 CPU µarch released since Intel Sandy Bridge (all the way back to 2011) and AMD Zen2. Notably, the very first Zen1 architecture is not available at Google Cloud nor any of the old Opteron Bulldozer-based solutions;
  3. To identify the overall best price/performance as well as, with a highly tuned piece of software, the best possible solution;
  4. Sneak-peek of ARM vs. x86.

What’s SPEC CPU 2017 and why?

SPEC, or Standard Performance Evaluation Corporation, CPU is the industry standard solution when it comes to CPU benchmarks [1]. The 2017 version is the latest suite, developed to reproduce real user applications. SPEC CPU 2017 is specifically designed to stress the system’s CPU and the memory subsystem. It provides a comparative measure of integer and floating-point applications. Other differentiator includes:

  1. Allows building the various benchmarks with different compilers and flags;
  2. Most of the benchmark programs are drawn from actual end-user applications, as opposed to generic synthetic benchmarks;
  3. Any result in the official repository [2] undergoes strict validation;
  4. Gives the general mix of unique features, SPEC CPU is therefore the golden CPU performance industry standard.

The benchmarks, Rate vs. Speed

SPEC CPU 2017 comes with a package containing over 40 benchmarks, organized into four suites:

  • The SPECspeed® 2017 Integer and SPECspeed® 2017 Floating Point suites are used for comparing time (aka how long it takes to complete a task);
  • The SPECrate® 2017 Integer and SPECrate® 2017 Floating Point suites measure the throughput or work per unit of time.

Regarding the Integer benchmark, the suite is made of the following ones. Many of them have a clear recognizable name (like 525.x264_r)

full list of the integer tests available

And about the Floating benchmarks (empty rows mean no equivalent is available between the Rate and the Speed suites):

full list of the integer tests available

For the sake of this work, I’ve decided to leverage the Rate variant of Int and FP suites and therefore exclude Speed. As mentioned above, the Speed tests are designed for single-thread execution and have a large memory footprint (up to 16GB) [3] while the Rate ones are designed for multi-process tests. As reported by AnandTech, the two suites are not very far apart in terms of their characterization [4] and as an added benefit, the Rate suite is much faster [5].

The handler: PerfKit Benchmarker (PKB)

PKB is an open-source tool designed at Google to run benchmarks across Cloud providers. It has several integrations, one of which is SPEC CPU 2017:

Running it is very straightforward, and we will explore it in the next section.

Installing PKB

The complete execution of SPEC CPU 2017 can take several hours, sometimes even days. And during the entire process PKB will check that everything runs smoothly (essentially no exit different than exit 0), therefore we need to run it on a persistent environment. For this work, I’ve chosen a GCE instance with the following characteristics:

  • GCE VM using an e2-standard-4 with 4 vCPU and 16GB of memory;
  • 100GB PD-SSD as root disk for Ubuntu 22.04;
  • The VPC used by the PKB machine is the default one — you’re free to customize it— and it needs a public IP address which can be ephemeral;
  • Last but certainly not least, IAM:
    - The SA used by the machine needs to have a wide variety of roles such as Compute Admin, Compute Network Admin, Security Admin, Storage Object Admin, etc. The easier way is using the default Compute Service Account blessed with the Editor role 🤣 but as a best practice, you should customize it;
    - Finally, ensure the flag Cloud API access scope is set to Allow full access to all Cloud APIs.

Once the VM is up and running, connect to it, and run the following script, which is going to:

  • Check privileged execution;
  • Check the running OS being Ubuntu 22.04;
  • Performing any system update;
  • Placing the GCP apt Keyring;
  • Installing the gcloud CLI;
  • Installing screen and tmux used later for the session persistency;
  • Installing the Python Virtual Environment;
  • Cloning the PKB GitHub repository;
  • Creating a new Python Virtual Env called pkb-venv and installing the PKB requirements;
  • Lastly, reboot the system to ultimately run the newer kernel.

PKB Patches

We could run PKB as-is, but unfortunately, we need to apply a few patches. I’ve already reported the issues upstream and they are being worked on, meanwhile, you have to locally apply them:

Speed up the SPEC CPU 2017 building time and in turns spend less money:

Support for AMD Rome, Milan, and Intel Ice Lake CPUs:

Support for Altra/T2A/ARM on RHEL9:

(optionally) Update SPEC CPU to the latest minor release before running it

Where to get SPEC CPU 2017? → 💰

SPEC software is not freeware, you need to pay for it. Thankfully, since the release of version 1.1.9 (Dec 22), the SPEC organization reduced the entry price for academic and nonprofit usage.

While for everybody else (individuals included), that’s the way to follow:

After the payment, you can download it from the SPEC website and copy the ISO file to the GCE instance. SPEC publishes the official file hash [6] for verification.

Running a sample SPEC CPU 2017 through PKB on GCP 😝

Okay, we’re ready to go:

  • You have a GCE instance correctly set-up;
  • PKB pulled and patched;
  • SPEC CPU 2017 bought and copied locally on your GCE VM.

We’re ready, right? Well, almost 🙃 One of the characteristics of SPEC is about the ability (and requirement) to build the various benchmark (remember before there was a patch to compile in parallel to speed things up?).

Here you have two options: either take the official, pre-included, cfg file and customize it for your needs (available under the config folder).

sample config files shipped out of the box

Or take mine, which went through several tuning phases and has the following customizations:

  • build_ncpus increased from 8 to 16 to speed up the building time — this is indeed a legacy config and superseded by the above-mentioned patch;
  • bind = 1 enable CPU Pinning for Int and FP Rate tests ensuring less noise (as well as no CPU cache thrashing) by the Linux Process Scheduler (CFS);
  • define gcc_dir "/usr" — this is Linux distro specific, in my case running on RHEL9, that’s the correct path to use;
  • %define GCCge10 — because the base GCC available in RHEL9 is GCC11.

Now a bit more on the flags used to compile the benchmarks:

  • Any Base test, will receive the GCC flags from PKB [7] but in the config file an important adjustment is required moving away from (the default)-O3 -march=native to just -O3. PKB later will get rid of -O3 [8]. There are two aims: first simulate a less deeply optimized software and then consistent building flags, and of course -march=native is in the way. Furthermore, disable Fortran unsafe math optimizations through FOPTIMIZE = -fno-unsafe-math-optimizations (reported at [9]) otherwise generates a SPEC validation error;
  • Any Peak test is built with -Ofast -march=native (default SPEC CPU behavior) to simulate highly optimized code.

Follows gcc-linux-x86–1T.cfg that will be used for any x86 system for single-thread execution:

Follows gcc-linux-x86-nT.cfg that will be used for any x86 system for multi-thread execution:

Follows gcc-linux-aarch–1T.cfg that will be used for any AArch64 system for single-thread execution:

Follows gcc-linux-aarch–nT.cfg that will be used for any AArch64 system for multi-thread execution:

First SPEC CPU 2017 run

At this point, don’t forget to copy the above config (or your customized version) in the same folder under which you have the SPEC CPU ISO file. In my system, both are under /root/PKB/spec2017.

To finally have PKB running, here is a wrapper that does it all for you:

  • Activate the python virtual environment;
  • Definition of several good flags (target image to use, target GCP region and zone, target GCC version, how many times running SPEC CPU, temporary directory to save the results, etc);
  • Definition of per-system GCC Flags. In this example, for an N2 Cascade Lake: -Ofast -fomit-frame-pointer -march=x86–64-v3 -mtune=core-avx2.

With the above config and this machine type, SPEC CPU will take about 26 hours to complete. Once done, under /root/tmp/pkb.n2-clx/spec_cpu2017/*T/runs/*/CPU2017*.log you will find the results, namely:

  • 2x CPU2017.001.intrate.refrate.txt.log file for Base and Peak INT results, one for single-thread (folder 1T) and one file for multi-thread (folder nT);
  • 2x CPU2017.001.fprate.refrate.txt.log file for Base and Peak FP results, one for single-thread (folder 1T) and one file for multi-thread (folder nT).
output files, one per type of test [INT|FP]-[1|n]T

Follows a sample INT result output (snippet for the most relevant content):

Follows a sample FP result output (snippet too):

I will share all the full files in one of the next iterations, which are dense in detail. The most valuable info you can read:

  • Per benchmark name, number of copies (when 1 means single-thread), Base Rate runtime and score, and Peak Rate runtime and score;
  • The estimated value for all Base Rate scores;
  • The estimated value for all Peak Rate scores.

Stay tuned for the next releases where I’ll share much more, including the full results, full comparisons, and some key takeaways on the results.

References

[1] https://en.wikipedia.org/wiki/Standard_Performance_Evaluation_Corporation

[2] https://www.spec.org/cpu2017/results

[3] https://www.spec.org/cpu2017/Docs/overview.html#Q24

[4] https://www.anandtech.com/show/14605/the-and-ryzen-3700x-3900x-review-raising-the-bar/6#:~:text=Moving%20on%20to,in%20the%20future).

[5] https://www.spec.org/cpu2017/Docs/overview.html#Q11

[6] https://www.spec.org/md5sums.html

[7] https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/speccpu2017_benchmark.py#L197-L202

[8] https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/speccpu2017_benchmark.py#L63-L66

[9] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84201#c12

--

--

Federico Iezzi
Google Cloud - Community

Cloud Engineer @ Google | previously RedHat | #opensource #FTW | 10+ years in the cloud | #crypto HODL | he/him | 🏳️‍🌈