The Right Compute for the Right Workload: Intel’s Performance Vision for Client SoCs

Authors: Guy Therien Intel Fellow, Michael Chynoweth, Intel Fellow, and Rajshree A Chabukswar, Senior Principal Engineer

Intel
Intel Tech
5 min readAug 23, 2021

--

Introduction

The variability of compute requirements across applications such as gaming, video streaming, content creation, workplace productivity, and artificial intelligence (AI) has been addressed in part by the development of system-on-chip (SoC) architectures with multiple instantiations of two core types: one core type for high-Performance-core, single-threaded or limited-threading scenarios, and one for highly Efficient-core execution of highly parallel workloads.

However, to fully realize the potential of this architectural approach developers need to be allowed to move away from simply assigned threads to cores based on static rules. Instead, what’s required is low-latency, detailed telemetry that informs the operating system (OS) of the system status and runtime instruction mix so that it can dynamically assign threads to the appropriate core(s) to optimize for performance — while meeting thermal design points, operating conditions, and power settings, all without user input.

This degree of contextual awareness and adaptability requires that intelligence be built directly into the core, with nanosecond, real-time precision.

Hybrid architectures address performance

The basic operating principle behind a hybrid-core SoC architecture is that larger Performance-core like Golden Cove (GLC), tackle foreground tasks and applications with limited threading capability, while smaller, more energy-efficient Efficient-core, such as Gracemont (GRT), handle highly parallel workloads and work in the background on tasks that do not require high quality of service (QoS). While practical, the degree to which hybrid SoC approaches are successful depends greatly upon the OS having the right information at the right time and have all cores are performing the right task according to the application and the context. For example, if power consumption is not a limitation, then high-performance threads are assigned as needed to the Performance-core.

Also, if an application’s processing instructions are highly parallelizable, they need to be assigned across the multiple Efficient-cores, all while ensuring no core is hung unnecessarily in a wait state or loop ring.

This enables new Desktop, Mobile and Ultra Mobile designs to address different performance goals, operating conditions, and scalability requirements (Figure 1).

Figure 1 Alder Lake architecture combined with dynamic scheduling based on real-time software requirements and hardware stats analysis allows designs to scale from desktop, to mobile, to ultra-mobile, while meeting performance requirements and design points.

To accomplish this level of telemetry, the intelligence on system status and runtime requirements must be baked into the core, with a direct low-latency path to the OS. In this way, the workload and system operating conditions can be more accurately aligned for optimum performance.

Alder Lake

The upcoming Alder Lake SoCs combines two new core architectures, the Performance-cores and Efficient-cores into a single SoC. In addition, we have added Intel® Thread Director to continually monitor software in real-time giving hints to the operating system’s (OS) scheduler allowing it to make more intelligent and data-driven decisions on thread scheduling. Threads critical to user experience are prioritized on the Performance -core delivering the best responsiveness. The Efficient-cores are utilized for highly parallel workloads and under power-constrained conditions when power is needed elsewhere such as the graphics or other IP accelerators. This performance hybrid architecture achieves optimal performance with different software solutions including multi-threaded, limited threading and power-constrained workloads.

Figure 2 shows the Alder Lake key design points. The Performance-cores are for handling those foreground tasks and for apps with limited threading, while the Efficient-cores are utilized for applications that scale and background tasks that do not require high Quality of Service and for power-constrained scenarios.

Figure 2 An example of SoC with Hybrid Architecture showing P-core and E-core

Intel Thread Director

To get to the heart of what’s going on at the core level, Intel developed Thread Director, which allows the hardware to make data-driven hints directly to the OS on which would be the best core type to schedule a thread. Thread Director utilizes Intel’s performance monitoring unit for hardware telemetry.

With two different core types, placement of threads on the right core type is key to achieving the best user experience. Traditionally, an OS would make decisions based on limited available stats, such as which applications are in the foreground or background.

However, by using Thread Director technology, Intel provides an assist to the OS using information that it didn’t have at the initial time of making scheduling decisions. Thread Director monitors the instruction mix, state of core, busy spin detection and other relevant micro-architecture telemetry at the same sub-nanosecond granularity at which CPU processing pipeline operates, and helping the OS make smarter choices about choosing the right core. That results in hardware and software working together to deliver the optimized performance under all real-world conditions.

How to take advantage of Thread Director?

Of course, developers need the right tools to take advantage of these new capabilities. To that end, Intel is collaborating with OS vendors and tools/libraries developers to:

  • Refine OS scheduling with Hybrid-specific optimizations and inclusion of Intel Thread Director feedback in scheduling decisions.
  • Develop threading libraries to ensure threads are optimally scheduled on Hybrid platforms.

In the meantime, Intel continues to build out its knowledge base of expected issues, tools, and techniques for Hybrid Optimizations and expects these one-time optimizations and enablement for Hybrid will carry forward to future Intel client roadmap as well. Already, AI processing optimizations have been defined.

Reference Links:

Notices and Disclaimers:

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

All product plans and roadmaps are subject to change without notice.

Statements in this document that refer to future plans or expectations, including with respect to future technology and products and the expected benefits and availability of such technology and products, are forward-looking statements. These statements are based on current expectations and involve many risks and uncertainties that could cause actual results to differ materially from those expressed or implied in such statements. For more information on the factors that could cause actual results to differ materially, see our most recent earnings release and SEC filings at www.intc.com.

Code names are used by Intel to identify products, technologies, or services that are in development and not publicly available. These a​​re not “commercial” names and not intended to function as trademarks.​​

© 2021 Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

--

--

Intel
Intel Tech

Intel news, views & events about global tech innovation.