Over-clocking under Linux

Artem Trunov
11 min readJan 10, 2019

--

Ok, technically Linux has nothing to do with overclocking. Most of hyperparameter tuning is on the BIOS side. However there is a significant difference for the operator depending on the OS he/she uses. There is a plethora of monitoring and benchmarking tools for Windows, and all the reviews on popular web sites are using them to show whatever results they get. You simply don’t have this kind of choice under Linux, and it makes the operator’s job somewhat more difficult, especially on the monitoring side. I’ll cover here the tools I checked out and used.

Overclockers aim to maximize the performance of their machines *while keeping them stable*. This is achieved through raising CPU’s and DRAM’s operating frequencies and voltages, and assisted by improved removal of excess heat that ‘s generated by the CPU due to increased frequencies and currents.

Removing excessive heat

Popular, or I’d even say mandatory technique on Intel’s 6th,7th and 8th Gen CPUs is to remove the nickel-plated copper lid and replace a low quality thermal paste between the silicon chip and the inner surface of the lid by a liquid metal paste, like Grizzly Conductonaut or Cool Laboratory Liquid Pro/Ultra. This is called deliding or scalping a CPU. The operation is sort of sensitive and requires some tools, but is certainly doable by most people who e.g. can assemble a PC for themselves. Otherwise someone can do it in your city for a fee.

Along with deliding, a water cooler with at least 240 mm hit sink is preferable over tower CPU fan coolers, although the top of line coolers from leading brands could treat high loaded CPU nicely as well. Unfortunately there is no power rating given by manufacturers for their products, except for some vague guidance like “high overclocking potential”. No one will tell you for sure what product will be able to remove 200 Watts worth of heat from your overclocked CPU, so you are on your own.

Finally, you want to be able to deliver stable power to your CPU, and here a motherboard comes to play, particularly a set of voltage regulators that are responsible for keeping the voltages stable while supplying high current to a CPU. Motherboard vendor will tell your about i.e. 8-phase power design, but the devil is in details, and you really have to look for reviews and chose a model that is capable of achieving your OC target. Anyway, for stable performance, these VRs typically found around the CPU socket, need to be actively cooled as well, since otherwise too much thermal-induced noise prevents correct voltage reading and leads to instability.

Benchmarking Tools

The tools I used not only apply high load on CPU and RAM, but also have some measurable output, and, more important, have checks in place to assure no errors in calculations. If you systems does not freeze or crash, but produces errors during a test, it can not be considered stable.

Prime95

This is probably the only common benchmarking tool between Windows and Linux users. However, Linux version still has some limitations. It does have a command line “torture mode” switch, but there is no switches to run a particular test (i.e. change the size of FFT vectors). Instead, prime95 throws different tests which results in uneven load on the CPU, thus not really applying a highest sustained load. When launching it like ./mprime -t with no additional configuration, one has to wait some time for it to start running small FFT size tests. Another observation I made — it does not load all cores, and some times starts with 5, and may eventually only use 2.

Finally, it does not output a single figure you can use as a measure of your setup performance. When I halt continuous torture mode, it just says how many tests it completed in what time. It does report a number of errors and warnings though — pay attention to that. I’d be interesting to see how Linux folks use this tool for benchmarking, not just load torture.

There are several versions that are worth mentioning. The current version 29.4b uses AVX and AVX2 instructions and it can apply the maximum load to the CPU.

There is a 26.6 version that is not using AVX, and sometimes overclockers “cheat” by using this version. This is ok, when you want to test your system at high, but more typical load, since AVX/AVX2 sets are really less used in common software and games, than in specialized sotfware, like rendering or numeric calculations. I however found out that 26.6 for Linux does use instructions from AVX set, and this can be detected in two ways: one can use a simple utility elfx86exts to find what instruction sets a given ELF binary or library uses (read more on this utility below), or just observe core frequencies drop by AVX offset, as specified in the BIOS. This leaves me wondering if Windows version is really built without the AVX instructions or it translates them to somethings else.

Finally, there is 25.11 version that is truly AVX-free, but its rather old and doesn’t run well on a 6-core CPU, i.e. doesn’t load all cores.

I have used prime95 occasionally, but didn’t rely on it.

LINPACK

Intel’s LINPACK benchmark is imposing a lot more load on the CPU and RAM, and it’s noticeable by increased temperatures and power draw. LINPACK test solves large systems of linear equations and thus benefits from vectorized AVX/AVX2/AVX512 instruction sets. Besides, one can chose test parameters that can impose more CPU load or less, use more RAM or less. The latter is especially to advantage, when one wants to test the RAM and has greater than typical amount of RAM, i.e. 64GB.

The minimal configuration consists of 5 numbers, each in one line. The config file does require header lines for whatever reason. Take an example from the distro.

I have used the following configurations:

  1. Heavier CPU, less RAM:

1, 40000, 40000, 1, 1

This test would run for a minute and uses 12 GB of RAM.

  1. Lighter CPU, more RAM:

1, 90000, 90000,1, 1

This test runs for 3 minutes and uses 64GB.

I must note, that “lighter” test was still some 10-30% heavier on the CPU than prime95 (depending on which FFT size mprime uses).

stress-ng

This is a good test suite, offers several methods for CPU testing, uses AVX and shows metrics. But doesn’t unleash full AVX power on a CPU. You may want to use it for other subsystem testing though — it covers a lot.

Monitoring Tools

Under Windows they use HWInfo64 and CPU-Z. There are no such tools under Linux. Live with it.

Most of people turn to lm-sensors package (and its sensors binary, as well as some graphical frontends, like psensors) for various readings, and this does make sense for some models, and some parameters. But beware that some HW is not supported by lm-sensors, or modules are maintained by a single enthusiastic persons, i.e. look for i87.

Also, it turned out my Asus MB only supported core temperatures via a generic coretemp module, and that’s it.

Apparently, MSR is less universal way of getting the readings and is processor-specific, that’s why we turn to Intel-only tools at this moment. Note that use of MSR needs some prerequisites. On Ubuntu systems:

# modprobe msr
# echo 0 > /proc/sys/kernel/nmi_watchdog

Also note, that the latter might not be possible to execute under sudo, it’s possible that you’ll need to become root with sudo su.

To make sys control change permanent, you can add kernel.nmi_watchdog = 0 to your /etc/sysctl.conf

i7z

This is an open source utility developed by a student a while ago, however still pretty useful for basic overclocking needs. It reports the following instantaneous values per core: actual frequency and a BLCK multiplier, % of time the core is in C0 (running) or C1 (halt) state, as well as couple of deeper states, core temperatures, and (drums!) core voltage. I have not seen a utility that could give me core voltage readings, while it being one of the key parameters to watch.

The utility is packaged for most distros and needs to be run as superuser. It is dated and lacks implicit support of the newest generations of CPUs, but hey, it works nonetheless!

pcm.x from PCM

PCM is an Intel-specific package, and gives a lot more runtime details, and allows to dump them to a file in csv format, which is convenient for further analysis. For overclocker’s needs it can give familiar frequency (although as factor of the base frequency, i.e. 1.30 means 3.7GHz*1.30=4.8GHz), core temperature (although as degrees left in the thermal headroom, i.e. 100 minus actual temp in my case), but also gives CPU energy consumption and RAM throughput. It does not show voltages. The CPU power part is a bit controversial. How is it calculated? Can it be trusted, even as a relative number? I’d prefer to double check with inline powermeter at the 8-pin CPU power molex socket on the MB, but I lack such a tool. Instead I relied on a wall socket powermeter that used to give me a total power drawn by the PC.

pcm and i7z will happily work together at the same time (in different terminals, obviously).

turbostat

Written by an Intel employee. Can collect more information than the two above utilities and is extensible — one can request in a command line to add specific MSR or sysfs entries. Unfortunately, does not show vcore out of the box.

dstat

Modern all-in-one alternative to iostat, vmstat etc. Has modules for many different monitoring needs, including thermal. Install via your OS package manager.

CoreFreq

Very interesting terminal-graphics monitor (with beautiful soft colors!). Download from Github repo and build. Follow README to load the kernel module, start the deamon and run the cli:

$ ./corefreq-cli -h
CoreFreq. Copyright © 2015–2018 CYRIL INGENIERIE

usage: corefreq-cli [-option <arguments>]
-t Show Top (default)
-d Show Dashboard
-V Monitor Power and Voltage
-g Monitor Package
-c Monitor Counters
-i Monitor Instructions
-s Print System Information
-M Print Memory Controller
-R Print System Registers
-m Print Topology
-u Print CPUID
-k Print Kernel
-h Print out this message

Without any options it runs a terminal-gui in a “top” mode. Then one can use the same option letters do work as mode-switches. It unfortunately won’t show me vcore values (but all zeros).

Intel’s powergadget

Check out from Intel developer zone. Not maintained for Linux any longer.

OS Kernel Tools

First, see excellent kernel.org write up on CPU Performance Scaling.

acpi-cpufreq kernel driver

The is the driver that at forefront of OS power management, as it’s CPU-brand agnostic. Nothing to see here for Intel owners, as intel_pstate driver takes its place.

intel_pstate kernel driver

The is the driver that at forefront of OS power management on Intel cpus, unless it’s disabled by a kernel boot parameter intel_pstate=off

See again Kernel.org’s documentation on intel_pstate CPU Performance Scaling Driver.

What seems to me is that Intel believes CPU and BIOS are already doing good job with limiting CPU power consumption while maximizing performance, so it does not need a lot of hints from the OS.

This driver provides two policy governors — powersave and performance, where powersave is more like cpufreq’s ondemand governor. I have tried both (change via cpupower utility, described below), and saw no difference in power consumption or performance during the test. I made the following interesting observation — while powersave policy is on, CPU downclocks to 800MGz at idle, and idle power consumption is ~40 Watts (at wall socket, so objective measure). When I change the policy governor to performance, it starts to run CPU at the max frequency, but the power draw stays the same! Just remember, that it’s neither frequency, nor Vcore, that affects power consumption directly— it’s the actual computing load on your CPU.

energy_performance_available_preferences can be further turned off with kernel boot parameter thermal.off=1. Or the preference can be changed from default balance_performance to performance by writing to energy_performance_preference.

intel_powerclamp kernel driver

This is an interesting idea — to inject idle states to cpus to let them cool down. See kernel docs. The driver essentially has one knob — a user can control (via sysfs) what percentage of idle state to inject when the system hits full load. I.e if your application runs on 6 cores and consumes them completely, you’ll see 600% utilization with top under Linux. With 10% idle state injected by the powerclamp driver, you’ll see 540% consumed by the application, and 60% by the kidle threads. The authors claim that:

Test/Analysis has been made in the areas of power, performance, scalability, and user experience. In many cases, clear advantage is shown over taking the CPU offline or modulating the CPU clock.

Unfortunately the driver does not differentiate the type of load. 100% AVX load is much more heat-producing than non-AVX load. Injecting idle times between AVX instructions makes some sense, while running at 100% non-AVX load could be perfectly fine.

More over, my experiments with this driver under heavy Linpack load show that the temperature still flip-flops in a great range, which eventually leads to switching to lower P-states. This makes me suspect, that this driver either does not inject idle states regularly enough in order to smooth the temperature, or instantaneous load varies so much that its enough to heat the cores up between regular injections.

intel-rapl kernel driver

RAPL stands for Runtime Application Power Limit. This driver can effectively cap the power consumed by the CPU (or RAM) by switching to lower P-states. The following will limit power consumption of a CPU to 130W at the expense of performance — it will lower frequencies and voltages.

# cd /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0
# echo 130000000 > constraint_0_power_limit_uw

This did work for me, and the power fluctuated little around the set limit (while at AVX load), at the expense of fluctuating frequency and voltages. This somehow always led to system hangs and reboots, so the overall result is negative.

thermald

It is already on by default in Ubuntu 18.04. The idea is to give some control to the user space, to allow some custom cooling solutions.

Thermal daemon watches thermal zone’s readings and engages cooling devices upon tripping points defined in the thermal-conf.xml file.

While your intel_pstate driver will engage on the CPU’s temperature trip points, you can instead or in addition configure your own trip points and associate an existing or user-defined cooling devices with it. Cooling devise may not only be e.g. a fan, but a kernel driver as well. Notably, you can configure intel_powerclamp driver or intel-rapl driver to be your cooling devises.

It gives user a fine control on exactly when, how and for long to engage a cooling devise. Notable, it allows to use PID Controls to determine a target state of a cooling devise.

For any platform thermal zones and cooling devises can be found under /sys/class/thermal.

You’d rather not use it while working on your overclocking settings, since it may have some policies that will prevent you from reaching max performance. May be it could be useful for laptops, so one can further restrict power consumption, but for any overclocking needs, it can be safely disabled. I might have some ideas on using thermald, but not at this moment.

Other utilities

elfx86exts

Disassembles binaries and prints which instruction sets it’s using. I use it to find if a binary uses AVX family sets. Clone this Rust source from the Github repo and build (some extra packages required for build). Use as:

$ elfx86exts/target/debug/elfx86exts — help
elfx86exts 0.1.0
Analyze a x86 binary to understand which instruction set extensions it uses.

USAGE:
elfx86exts <FILE>

FLAGS:
-h, — help Prints help information
-V, — version Prints version information

ARGS:
<FILE> The path of the file to analyze

cpupower

This utility from linux-tools-common can show and set some power management parameters on the OS level. I.g. to set the governor:

sudo cpupower frequency-set -g performance

To change enegry-performance bias (0–15)

sudo cpupower set -b 0

For the most initiated

Read performance counters directly from MSR. In this example, we read TjMax — maximum allowed package temperature. This is often a reference point in temperature reporting. This should output 100 on Intel 8th Gen. Your mileage may vary.

sudo rdmsr --decimal --bitfield 23:16 0x1A2

Use “Intel® 64 and IA-32 Architectures Software Developer’s Manual. Volume 4:Model-Specific Registers” for reference.

For example, one can disable IA32_MISC_ENABLE and not have voltage/freq drop due to htermal event. Or, perhaps, set such drop using MSR_THERM2_CTL.

MSR_PERF_STATUS[47:32] can be used to read Core Voltage. P-state core voltage can be computed by MSR_PERF_STATUS[37:32] * (float) 1/(2¹³).

(there is an error in the doc, it’s either 47 or 37)

Links

lm-sensors github

PCM github

Power Management States: P-States, C-States, and Package C-States

List of Useful Power and Power Management Articles, Blogs and References

CPU frequency scaling by ArchWiki.

Thermald github

Intel Thermal Management (8th Gen platform datasheet vol1, see chapter 5)

Overview of CPU performance counters (in Ukrainian)

--

--