Android Thermal – 3 | Mobile phone heating

Shubham Garg
6 min readOct 18, 2023

--

Nowadays, the continuous improvement of fast charging technology and mobile phone processor capabilities has led the aggravation of the heating of the mobile phone terminal. The heavier the CPU load, the hotter it is; the higher the power, the hotter it is. If we reduce the charging power when the temperature rises to the door limit, we can naturally reduce the heat, thus reducing the temperature of the mobile phone. In this way, a dynamic equilibrium process is established.
The Thermal module is mainly responsible for temperature control. When the temperature is low, it can be heated up, when the temperature is high, it can be warmed down, and even reset the system. the trigger conditions are met.

In the blog 1 we have got the understanding why we need a thermal mitigation framework.

General Android User ✅ Blog 1

Android application developer ✅ Blog 2

AOSP Platform engineer / OEM ➡️ this->

So the agenda of this blog will be understanding the thermal mitigation framework from the Kernel layer and understand what are the changes that can be done by the OEM(Original Equipment Manufacturer).

Linux follows the abstraction principle (abstract layering). The layer or module that integrates all resources and info of subsystem is called core layer, the device operation(connection) is handled by device layer, and the logic of how to handle the device and data is managed by driver/governor layer.
So similarly in AOSP thermal subsystem:
core layer.

Core layer 👉 thermal core
Device layer 👉 thermal zone 👉 (Temp Sensor, NTC )
Device layer 👉 cooling devices 👉 (CPU, GPU, RAM)
Logic layer 👉thermal governor 👉 (step wise, bang bang, etc)

Thermal Framework Overview

How the thermal framework works should be clear by now.
Thermal Zone ensures that the thermal sensors are up and running. Thermal governors have the underlying strategies and algorithm like step wise and bang bang, which will notify the necessary cooling request to the cooling device like lowering the CPU frequency if the temperature is high.

Thermal Core:

The main program of thermal, the driver initialization program, maintains the relationship among thermal zone, governor, and cooling device, and interacts with user space through sysfs.

Thermal Governor:

Temperature control algorithm (temperature control strategy). Solve the problem of how the cooling device selects the cooling state when the temperature control occurs (throttle).
Various temperature control strategies are provided in the Linux kernel, as follows:

  1. step_wise -> stepwise ⬆️ ⬇️ in cooling device freq.
  2. power_allocator
  3. user_space
  4. fair_share
  5. bang_bang

You can use any of the governor as default in Android. To do this you need to modify the thermal config. If not specified any the step_wise governor will be default. Check on the thermal config text if you want to see code.

Go through the code step_wise.c for more info. Here in this diagram we are considering only CPU as cooling device.

Pay attention here stepwise will only tell the cooling device which is cpu in our case here to upgrade or downgrade the cpu frequency. The stepwise governor inform cooling device in a stepwise manner. How the cpu is going to upgrade and downgrade will be done by respective cooling device driver not by stepwise governor.

Step Wise Governor for CPU Freq

Thermal cooling device

The executor of the system temperature control, the driver for implementing cooling measures (cpufreq_cooling, cpuidle_cooling, devfreq_cooling, etc.). In layman’s terms, it is a cooling device, such as a fan. The cooling device performs cooling operations according to the state calculated by the governor. Generally, the higher the state, the higher the cooling demand of the system. The cooling device needs to be bound with the trip point. When the trip point is triggered, the corresponding cooling device will implement cooling measures.

Thermal zone device

Create a thermal zone node and connect to the thermal sensor, thermal_zone* in the /sys/class/thermal/ directory, and configure and generate it through the dtsi file. The thermal sensor is a temperature sensor (that is, the thermistor NTC), which mainly provides temperature sensing for the thermal.

So this is what the overall thermal framework looks like.

To understand the code flow let’s take governor as step wise governor.
This is how it will looks like

  1. Thermal Zone will get the temp from the thermal sensor hardware which is updated in the location /sys/class/thermal/thermal_zone* get_temperature() function reads from the device tree and pass the info to thermal.c (thermal_core main file)
  2. Thermal core will check if the temperature is too hot or critical. For too hot or critical the cooling is handled differently. If not critical/ too hot it will pass the temp value to thermal governor.
  3. Here the thermal governor is step_wise as defined by the config file in the thermal module. Already explained earlier.
  4. In step wise we have different temperature trip points for different cooling devices, we check for it and if the trip point is triggered send it to the cooling device driver with the appropriate call like increase or decrease the freq using function pointer callback.
  5. We are taking a example of CPU as a cooling device here cpufreq_cooling driver of associated SOC will be called and the intented action will be taken and cooling device state will be updated.
  6. Similar steps are repeated again and again when there is a temperature change.

Now let’s see what is the scope of customization here by the OEMs like OnePlus and SoC vendors like QualComm.
We will keep it short and technical discussion only.

SoC vendor possible customization:
Since SoC have major control over the hardware they are more suitable for hardware customizations

  1. Using better thermal material for the SOC and distance between heat sink and source.
  2. Provide more extensive cooling device support for example we are clear about the role of charging control, cpu and gpu for thermal mitigation but RF (Phone Antenna/Modem) can also acts like cooling device by frequency downscaling or PA(power amplifier) use optimization.

OnePlus and Mobile Manufacturers possible customization:

Mobile vendor can tune for specific use cases or balance the performance and thermal mitigation.
Like go through this research paper: https://arxiv.org/pdf/1904.09814.pdf

Here they have proposed that we tune the cpu scheduling such that the background task runs on lower core or in layman terms less resource so that cpu power can be saved which will save cpu from overheating.
Remember step_wise governor is different and cpu_scheduler is different.
Here the proposition is to tune the scheduling for this case so that cpu can run with less resource as background task user impact is less.

In this way Mobile vendors can tune their devices for better thermal performance.

For any questions or suggestions drop a comment.

--

--