Technology

Alder Lake introduces a new client SoC architecture that scales from powerful desktops to ultraportable laptops

Author: Marcus Yam, Technology Evangelist at Intel

Intel
Intel Tech

--

Personal computing is changing. We do more with our machines than ever before, whether it’s working, creating, gaming, or often a combination of those activities. We also use a diverse array of devices, from high-performance desktops to ultraportable laptops. Intel’s next generation of client processors is designed to meet the challenges of this reality. Codenamed Alder Lake, it introduces a new performance hybrid architecture and represents one of our biggest advancements ever for the x86 architecture.

To tackle increasingly complex workflows that blend foreground and background tasks, Alder Lake integrates two cores types — performance-cores and efficient-cores — and then uses our innovative Intel® Thread Director to help seamlessly shift work between them. The SoC also has more bandwidth inside and out, including support for cutting-edge connectivity, memory, and PCI Express. Intel Xe-LP integrated graphics complete the picture with different configurations tailored for desktops and laptops.

Alder Lake is a unified architecture that scales to meet the needs of multiple form factors and power envelopes. Versatile enough to adapt to a wide range of systems and workloads, it embraces the diverse computing environment of today and lays a strong foundation for significant power-performance gains in the future.

A performance hybrid architecture primed for the future

Modern PCs process a symphony of tasks in parallel. Some are important to execute quickly, like opening a spreadsheet or running a game, while others can be handled more opportunistically, like syncing cloud storage or applying system updates. Allocating resources intelligently is crucial to achieving the right balance of performance and efficiency. That’s why Alder Lake dynamically distributes processing between cores designed for different kinds of work.

Performance-cores pursue peak speed

The Alder Lake Performance-core, or P-core, is optimized for raw speed to reduce latency and push single-threaded applications to the limit. Codenamed Golden Cove, it boasts better IPC than the previous generation, beating Cypress Cove performance by an average of 19% at the same clock frequency1. It achieves this feat by being wider, deeper, and smarter than its predecessor.

The front end is wider, with more decoders and bigger buffers designed to crunch complex code. Smarter prefetch and better branch prediction help improve the flow of operations into the out-of-order engine, which has a wider allocation stage to improve intake, plus the ability to execute some instructions early, so they don’t consume resources later in the pipeline. The engine is also deeper thanks to a bigger buffer for reordering instructions, more physical registers, and a larger scheduling window. Work from the scheduler feeds into more execution ports that lead to more integer and vector execution units than the previous generation. The vector engine now includes dedicated fast adders with lower latency than the fused multiply-add units in the last gen.

Modern workloads and datasets demand more from every level of cache, so this P-core is designed accordingly. It has a larger L1 cache that’s optimized to get the most out of the deeper execution engine. We also keep the core well fed with a much bigger L2 cache. Client implementations of Alder Lake feature 1.25 MB of L2 cache per core, more than double what’s available in the previous generation. They also have an all-new L2 prefetch engine that observes program behavior to estimate future memory access patterns, and then prefetches down multiple paths at different depths depending on their likelihood.

Efficient-cores scale for maximum efficiency

While the P-core is designed to deliver a substantial improvement in general-purpose computing performance, the Alder Lake Efficient-core has a different mission. Codenamed Gracemont, the E-core makes efficient use of both power and die area to enable wide dynamic range. We can fit four Alder Lake E-cores into a similar die area as a single P-core, which provides greater flexibility for laptops and ultraportables along with the ability to aggressively scale up multithreaded performance for other form factors.

Despite an efficient design, the Alder Lake E-core still packs a punch. At the same frequency, it offers the same general purpose integer performance per clock as Skylake at a fraction of the power. For a single logical process, one E-core delivers 40% more performance than Skylake at the same power — and equivalent performance at 40% less power. Its throughput is even more impressive. Four E-cores running four threads provide 80% more performance than a Skylake setup with dual cores and four threads via Hyper-Threading, and they consume less power as well. The E-cores can also match Skylake’s throughput while consuming 80% less power, which represents significant savings for mobile implementations2.

To squeeze the most out of both power and area, our engineers focused on elements we wanted to scale and avoided features outside of the E-core’s competency. This approach reduces the energy required to execute individual instructions. We also lowered the operating voltage across the full frequency range to further conserve power.

The front end is deep, with a large instruction cache complemented by on-demand length decoding that can store additional information derived from code that the core has never seen before. Smarter branch prediction minimizes wasted work to conserve energy, while hardware-driven load balancing adjusts long chains of sequential instructions to ensure parallelism. Executing in parallel is essential to performance, so the E-core has a wide back end that leverages a large out-of-order window to discover data parallelism and an array of 17 execution ports to deliver on it. Each quad-core cluster shares a large L2 cache available in 2MB or 4MB configurations to meet the needs of different products.

The Intel® Thread Director guides OS scheduling

Putting the right thread on the right core at the right time is key to maximizing the performance and efficiency of Alder Lake’s unique architecture. Conventional approaches rely on static rules that are easy to implement, but they can’t adapt to changing conditions, so we implemented a more dynamic system that uses real-time feedback to help the OS scheduler make better decisions about how to allocate work. This approach works with existing applications and is completely invisible to the user, ensuring a seamless experience.

Schedulers typically work with limited information, like whether an application is running in the foreground or background. Our performance monitoring unit can access much more from the hardware, including the instruction mix, the state of individual cores, and other relevant telemetry. We even know when threads are actively doing real work as opposed to spinning as they wait for the next task. The new Intel Thread Director uses this intelligence to guide the OS scheduler toward making better decisions4.

Visualization for illustrative purposes only3, see the full demo here

Hardware-guided scheduling is best explained with examples. When you launch a performance-intensive application like a game or content creation software, the associated threads immediately go to the P-cores. As background tasks like email and cloud syncing begin, they’re scheduled on the E-cores to maintain maximum performance for your primary tasks. But what happens when something more important arises, like a thread that requires AI instructions? If all the P-cores are occupied, the Thread Director provides a hint to the OS that a higher-priority thread is ready and suggests which thread to move to the E-cores to make room. We can also make these transitions when a thread on the P-cores goes into a spinning state. When that happens, the Thread Director notifies the OS to shift the thread to the E-cores, allowing a more performant one to take its place.

We collaborated closely with Microsoft to optimize the Intel Thread Director. Windows 11 uses its input for more than just thread scheduling, as well. Hints from the hardware help the OS decide which cores to park and unpark for power savings. The PowerThrottling API also allows developers to define quality-of-service attributes for threads, including a new EcoQoS classification that defines a preference for power efficiency over performance. The Edge browser along with other Windows 11 components take advantage of this capability to both improve energy efficiency and ensure that P-cores are available for performance-critical threads.

Integrated Xe-LP graphics tuned for the platform

The integrated graphics for Alder Lake are based on the same Xe-LP architecture as our processor codenamed Tiger Lake. Mobile versions flex up to 96 EUs to deliver a good gaming experience at 1080p, while desktop variants scale back to 32 EUs because gamers and content creators can easily add a powerful graphics card. Both feature an advanced media engine with 12 bits of precision throughout the pipeline, hardware acceleration for decoding AV1 content, and support for HDR10 and Dolby Vision. The accompanying display engine supports 12-bit color and as many as four 4K displays. It also has dual eDP connectors to enable more laptops with multiple displays.

A cutting-edge SoC for the next generation

Alder Lake takes x86 in a bold new direction while also leading the adoption of faster industry standards. We pushed the limits across the board, including memory, PCI Express, wireless networking, and peripheral connectivity.

Although Alder Lake has large on-chip caches, it still needs fast access to main memory. The SoC supports the latest DDR5 memory and retains compatibility with DDR4 to ensure maximum flexibility. Our unique physical layer supports up to DDR5–4800 and DDR4–3200 in addition to low-power LP5–5200 and LP4x-4266 alternatives. To further conserve power and ensure maximum performance on demand, Alder Lake can scale memory frequency and voltage dynamically in response to bandwidth requirements and workload behavior. Enhanced overclocking support enables even higher performance with the right DIMMs.

PCI Express links the processor to high-performance devices like graphics cards and SSDs. Again, we’re on the cutting edge. Alder Lake brings desktops up to 16 lanes of PCIe Gen 5 primed for the next generation of GPUs and storage devices. This x16 link boasts up to 64GB/s of bandwidth — double what’s available in the previous generation — and it’s paired with 4 lanes of PCIe Gen 4 to match the latest SSDs. Supporting multiple generations of PCI Express on the same product allows us to balance the needs of different devices with other considerations, like power.

For external gear, integrated Thunderbolt 4 pushes 40Gbps through a reversible Type-C connector that fits ultra-slim form factors. There’s enough bandwidth for docking stations with multiple 4K displays or even for external GPUs, plus 100W of power for laptops and other devices. Thunderbolt 4 also supports the latest USB4 standard, ensuring compatibility with a diverse ecosystem of peripherals.

WiFi 6E completes the connectivity with the latest wireless standard. It keeps the smarter traffic management from WiFi 6, which helps to maximize performance on busy networks, and adds a new 6GHz band with significantly more spectrum than the existing 5GHz band. That spectrum should also be less crowded than the 2.4GHz and 5GHz bands typically used by wireless networks, enabling higher real-world performance with compatible routers5.

Connecting building blocks with high-speed fabrics

Given the extensive bandwidth throughout, the platform needs internal fabrics that can keep up without compromising power efficiency. Alder Lake has separate compute, memory, and I/O fabrics that scale dynamically based on demand.

The compute fabric connects the cores and integrated graphics to the last-level cache, through which they can access main memory. Available bandwidth scales up to 1000GB/s, which works out to 100GB/s for each individual P-core or cluster of 4 E-cores. The fabric dynamically routes data to optimize bandwidth and latency based on the fabric load. It also switches the last-level cache between inclusive and non-inclusive policies based on utilization.

Up to 204GB/s is available through the memory fabric, which adjusts its speed and bus width in real time to optimize for high bandwidth, low latency, or low power. The I/O fabric adds up to 64GB/s for both internal and external devices. It changes speed on the fly to match bandwidth requirements, and the transitions are so seamless that they don’t interfere with the normal operation of connected devices.

Scalable across a wide range of form factors

To match the needs of machines in all shapes and sizes, Alder Lake combines the same primary building blocks in different ways on our refined Intel 7 process. At Architecture Day 2021, we revealed three key design points that will extend from 9W SoCs designed for ultra slim devices all the way up to 125W processors for powerful desktops.

The desktop version is designed for peak performance and expansion capacity. It flexes up to 16 cores in an 8+8 configuration with up to 30MB of last-level cache. Each P-core can execute two threads, enabling up to 24 parallel threads overall. The processor also offers 16 lanes of PCIe Gen 5 alongside four lanes of Gen 4 connectivity, and it’s paired with a separate PCH package that adds up to 28 more lanes split between Gen 4 and Gen 3 controllers. Faster I/O on the processor requires a new LGA1700 socket with more pins than the previous generation.

For high-performance laptops, we have a much smaller BGA Type3 package whose die still makes space for up to 14 cores split between 6 P-cores and 8 E-cores. This SoC adds dedicated image processing for cameras and extensive Thunderbolt 4 connectivity for external peripherals. It also features substantially stronger Xe graphics with 96 execution units — three times the number available in desktop versions of Alder Lake.

Ultra-mobile devices get an even smaller BGA Type4 package that measures just 28.5 x 19 x 1.1 mm. Despite its diminutive dimensions, the most compact Alder Lake die shown at Architecture Day manages to fit up to 10 cores divided between 2 P-cores and 8 E-cores. It keeps the essentials, including Thunderbolt 4 and the same Xe graphics as its larger BGA sibling.

Charting the future of personal computing

Alder Lake enables the next generation with an innovative performance hybrid architecture that’s optimized for the realities of modern work, play, and creation. Backed by the latest I/O, built with our most advanced process technology, and scalable across a wide range of client form factors, it defines a new path forward for the next decade of personal computing.

Notices and Disclaimers:

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

For Architecture Day workloads and configurations visit www.intel.com/ArchDay21claims.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.

1. Based on overall scores and individual subcomponent scores on: SYSmark 25, CrossMark, PCMark 10, SPEC CPU 2017, WebXPRT 3, Geekbench 5. Testing as of May 28, 2021. Intel® Core™ i9–11900k, 4x16GB 1R DDR4 UDIMM 2DPC 3200 Max Memory Frequency, Samsung 980 Pro 500GB PCIe SSD, WIN 10 20H2 19042.ent.rx64.789, High Performance Power Plan, 1920x1080 display resolution. Alder Lake Desktop S801 , RVP board with 2x16GB 1R DDR5 UDIMM 1DPC, 4400 Max Memory Frequency, Samsung 980 Pro 500GB PCIe SSD, WIN 10 20H2 19042.ent.rx64.508_​update.906, High Performance Power Plan, 1920x1080 display resolution.

2. Internal Estimates as of June 22, 2021 using internal architecture simulation. Workload: SPECrate2017_​int_​base estimates with GCC 8.1.0 -O2 binaries.

3. System: Pre-production Intel internal validation platform with Alder Lake-S running Windows 11 Enterprise Build 22000.150; Application: Intel® Thread Director demo. As of July 2021.

4. Intel® Thread Director is designed into Alder Lake processors and helps supporting operating systems to more intelligently channel workloads to the right core. No user action required. See intel.com for details.

5. For details on wireless workloads and configurations see this section of www.Intel.com/PerformanceIndex.

All product plans and roadmaps are subject to change without notice.

Altering clock frequency or voltage may void any product warranties and reduce stability, security, performance, and life of the processor and other components. Check with system and component manufacturers for details.

Statements in this document that refer to future plans or expectations, including with respect to future technology and products and the expected benefits and availability of such technology and products, are forward-looking statements. These statements are based on current expectations and involve many risks and uncertainties that could cause actual results to differ materially from those expressed or implied in such statements. For more information on the factors that could cause actual results to differ materially, see our most recent earnings release and SEC filings at www.intc.com.

Code names are used by Intel to identify products, technologies, or services that are in development and not publicly available. These a​​re not “commercial” names and not intended to function as trademarks.​​

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

--

--

Intel
Intel Tech

Intel news, views & events about global tech innovation.