This document outlines changes introduced to the Intel® software for general-purpose GPU capabilities in rolling releases. As the software includes several different projects, the changes for each release are grouped by project.
Support for each rolling release continues only until the next rolling release becomes available, with no updates provided for previous rolling releases. Therefore, we recommend upgrading to the latest rolling release as soon as it becomes available. To install packages for the latest rolling release, refer to the installation guide for your distribution. For a list of packages published on repositories.intel.com/gpu for each release and operating system, see Provided Packages.
2025-07-24ïThe 2523.12 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.10, 9.4, and 9.6
Ubuntu 22.04 and 24.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
In this release, the OpenCL compiler enforces stricter type conversion checks. Kernel code that implicitly converts a global pointer to an integer without an explicit cast may now fail to compile with an âincompatible pointer to integer conversionâ error. Most applications are unaffected, but if you encounter this error, update your kernel code to add an explicit cast or compile with the -Wno-error=int-conversion
flag.
Added support for updating specific GPU firmware in recovery mode.
Introduced security improvements.
Improved GPU error reporting by including UUID resources for better diagnostics.
Enhanced responsiveness during memory management tasks.
Refined tbb thread handling to improve scheduling efficiency, avoid redundant parking during cancellations, and ensure proper wake-up behavior.
The 2523.10 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.10, 9.4, 9.5, and 9.6
Ubuntu 22.04 and 24.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
In this release, the OpenCL compiler enforces stricter type conversion checks. Kernel code that implicitly converts a global pointer to an integer without an explicit cast may now fail to compile with an âincompatible pointer to integer conversionâ error. Most applications are unaffected, but if you encounter this error, update your kernel code to add an explicit cast or compile with the -Wno-error=int-conversion
flag.
Incorporated the latest security updates to address recent vulnerabilities, enhance protection, and ensure greater system reliability.
Intel® C for Metal Compiler ïDisabled global fence on platforms other than Intel Data Center GPU Max Series.
Added 8-bit floating point conversion intrinsics.
Added a helper function to retrieve the global thread ID along its dimension.
Added support for new Battlemage and Panther Lake devices.
Updated the CM specification including the LSC memory interface, cache controls, and CM macro requirements.
Added the stochastic rounding intrinsic declaration.
Included the main <cm/cm.h>
header implicitly, enabling caching when compiling from the CM source.
Added intrinsics for 2D block load and store operations.
Enabled the dynamic ICS via the opt-in KLV feature.
Updated the Graphics Micro Controller (GuC) to version 70.44.1.
Extended 2M userptr
support to 1G.
Enabled backport support for kernel version 6.13.
Implemented GenISA predicated load/store intrinsics with promotion pass.
Added the ActiveThreadsOnlyBarrier
option for OpenCL shaders.
Improved call site inlining heuristic.
Added a call merger pass that merges mutually exclusive function calls when they are too large to inline.
Added the __builtin_IB_disable_ieee_exception_trap
and GenISA_disable_ieee_exception_trap
intrinsic.
Introduced additional transpose block 2D SPIR-V APIs.
Added a flag to disable merging allocas of different types, providing better control over the merge alloca pass and disabling aggressive merging by default.
Added 32-bit ELF type support to ZeBin for x86 use cases.
Enhanced MergeAllocas
performance by replacing all allocas, generating casts at the point of use, handling select instructions in liveness analysis, avoiding merging allocas across ContinuationHL
calls in raytracing, and disabling allocas merging for raytracing.
Added support for recognizing OpenCL/SPIR-V built-ins represented as TargetExtTy
to the ProcessFuncAttributes
pass.
Enabled SIMD16 drop for Xe3 to minimize register spills.
Added the hasLscStoresWithNonDefaultL1CacheControls
flag to zeinfo, enabling 3D clients to detect Load Store Cache (LSC) stores with non-default L1 cache policies for proper UAV coherency flushing.
Started using MergeAllocas
for private memory merging, allowing reuse of non-overlapping private memory allocations to reduce overall memory usage.
Added support for SPIR-V MulExtended
instructions to the Vector Compiler (VC).
Set the default General Register File (GRF) size to 128.
Added VISA support for HF8 conversion instruction and Panther Lake devices.
Enabled SetHasSample
for the gather4*
instructions.
Added support for stochastic round bf8
intrinsic in the Vector Compiler (VC).
Added Panther Lake support.
Added support for new Battlemage device IDs.
Introduced support for Floating-Point DIVide (FDIV) instructions inside IGCVectorizer
.
Added support for the BUFFER_SIZE
explicit argument.
Added the Level Zero API for querying kernel argument data.
Improved handling of coherent and compressible resources.
Added support for new Battlemage device IDs.
Introduced the MOCS
variable for Xe2.
Enabled GO:L3
for OpenCL usages.
Integrated support for libmei
version 1.6.4.
Added a C++ wrapper.
Introduced the TeeGetKind
API.
Updated API version to 1.14.
Added support for offline metric calculation:
OpenOfflineMetricsDeviceFromBuffer
: Opens an offline metrics device object from a buffer.
SaveMetricsDeviceToBuffer
: Saves a metrics device to a buffer for future offline calculations.
CloseOfflineMetricsDevice
: Closes the offline metrics device object.
Added support in Xe KMD for configurable Overall Accuracy (OA) buffer size and half-full OA buffer interrupt handling.
Added new Battlemage device IDs.
Introduced the new EQUATION_ELEM_PREV_METRIC_SYMBOL
equation element where prev$$"SymbolName"
allows referencing the previous metric value within the local set.
Added support for the VectorEngine
metric group.
Added support for Panther Lake.
Optimized copy query by reducing the number of GPU commands.
Added support for configurable Overall Accuracy (OA) buffer size in Xe KMD.
Improved performance of OA configuration updates.
Reduced build time.
Optimized debug helper.
Added support for registering a TeardownCallback
to notify clients upon release of Level Zero resources.
Added support for sorting drivers based on provided devices.
Implemented basic leak checker in the validation layer.
Added zeImageViewCreateExt
and zeMemFreeExt
support to the leak checker.
Added API call logging to the validation layer.
Added the static Level Zero loader support.
Introduced support for 1.7 specification in the static loader.
Introduced Intel® Video Processing Library API 2.15 support, including new property-based capability queries interface, extended decoder and encoder capabilities reporting, and definitions for VVC main 10 still picture profile and level 6.3.
Added the explicit INSTALL_EXAMPLES
build option to control installation of example source code and content.
Updated the default Ubuntu build to version 24.04.
Improved AV1 decoding performance when all decode frame surfaces are in use.
Enabled property-based capability queries.
Introduced support for Intel® Media Transcode Accelerator.
Added new strings to the vpl-inspect
tool to improve output readability.
Added the -props
option to the vpl-inspect
tool to support querying capabilities based on properties.
Updated the default Ubuntu build to version 24.04.
Fixed a crash in GenerateBlockMemOpsPass that could occur in complex loops during analysis of memory access patterns when certain PHI node conditions were not met.
Fixed u8/u16 2D block read emulation by enforcing 4-column block width for alignment. Additionally, disabled decomposition of emulated d8/d16 transpose reads due to complexity and added emulation of d8/d16 transpose reads using native 2D block transform followed by mov instructions.
Fixed type arguments for creating the GenISA_sampleDCMlodptr
call.
Removed unnecessary tracking and 32-bit truncation of bindless image offsets in kernel arguments, improving handling of bindless images and eliminating redundant instructions.
Started allocating the General Register File (GRF) number for Vector Compiler (VC) untyped load 2D intrinsics.
Fixed an issue with incorrect emission of phi values for structures.
Implemented a cycle-proof deletion strategy to Intel® Graphics Compiler Vectorizer now to ensure reliable cleanup when discarding vectorizer chains.
Fixed a crash in ProgramScopeConstantAnalysis
that occurred during recompilation and resolved crashes encountered during the compilation of Blender kernels using the SPV_INTEL_bindless_images
extension.
Set cache control for SPIR-V 2D block prefetch when cache control decoration is missing or invalid, provided the target device supports cache control.
Fix an iterator invalidation issue.
Fixed an issue with generating spill temporary variables for 4GRF operands.
Fixed segfault caused by default output stream flags in Intel® Graphics Compiler with SYCL by replacing them with printf.
Ensured consistent instruction order for PrivateMem
to produce identical dumps in debug
and ndebug
compilations.
Started skipping debug calls in complex UnrollLoop loops.
Replaced std::map
with llvm::MapVector
.
Ensured uniform prefetch source address is GRF-aligned, enabled implicit arguments optimizations by default, and introduced GlobalOffset across Xe1, Xe2, and Xe3 platforms for improved performance and payload size reduction.
Added a check for the TotalGRFNum
flag value before trying to return the value passed from an API option.
Started emitting bitcast after selecting a value in SimplifyConstant
.
Stopped removing implicit kernel arguments, as they might be used by subroutines.
Started using the correct spill size for non-send destinations by rounding up to the nearest GRF size. Additionally, added a VISA option to enable spill cleanup within specified BB ID ranges to target transformations safely.
Started skipping dbg
calls for vector aliasing heuristic.
Fixed potential nullptr
dereference by passing the function as a reference to avoid null checks and improve efficiency.
Started reporting a warning when non-null/acc Architected Register File (ARF) register is used on ternary instruction.
Fixed issues related to opaque pointers support in GenXPacketizer
.
Fixed issues causing incorrect chunk sizes in the ConstantCoalescing
pass.
Optimized Read-Modify-Write (RMW) for strided first definitions in the entry basic block.
Moved VRT General Register File (GRF) bump-up after GRA optimizations.
Changed RayQueryDynamicRayManagement
flag to be off by default due to stability issues. It can be enabled through Application Intelligence Layer (AIL).
Unified RayInfo
between sync and async raytracing.
Fixed the fill checked built-in and implemented MAD built-ins for large shapes in the joint matrix.
Added a reserved VISA opcode and updated Intel Graphics Assembler (IGA).
Fixed a VISA assert issue.
Fixed a boundary condition issue.
Resolved an issue in the Vector Compiler (VC) affecting wrregion
operations with bf16
source data types.
Fixed the lgamma_r
behavior.
Fixed an issue where the atomic branch predicate was incorrectly selected when multiple modes were enabled, and removed the PreservesCFG
flag from the InsertBranchOpt
pass to avoid potential crashes.
Fixed initialization of PHI instructions of the i1 type.
Fixed handling the genx
volatile pointer as a function argument in the Vector Compiler (VC).
Corrected the HWTID computation to use state registers when WMTP is unsupported by the shader type.
Stopped inserting the check/release intrinsics if the shader has discards.
Fixed the address register restriction in the destination.
Added support for fcvt
with bf8
and hf8
data types.
Fixed the printf
issues.
Fixed incorrect alignment in MergeUniformLoad
with early return.
Fixed creation of chunk loads in the ConstantCoalescing
pass.
Improved size reporting in payload_arguments
in zebin.
Added a null check to prevent nullptr
dereference in GenerateBlockMemOpsPass
.
Fixed copying uniform variables.
Fixed the local copy propagation issue for indirect VxH source.
Stopped cloning debug instructions in CodeSinking
and CodeLoopSinking
passes.
Added conditional warning for dumped vector size, which is printed only when the ShaderDumpEnable
flag is enabled.
Added correct predicate to the mov
instruction when handling split samples.
Added lifetime.start
emission for classic resource loops inside nested loops.
Enabled stateful rt
stack for synchronized raytracing and separated sync and async rt
stacks in raytracing magic types.
Initialized structure members to prevent potential nullptr
dereference.
Fixed a DebugInfo
issue in LLVM to avoid out-of-order evaluation.
Simplified the call to readFirstLanes
for multiple getFirstLaneIDs
.
Added support for functions with no return values and no arguments in SIMDCF.
Added a pass to remove freeze instructions prior to code generation.
Fixed handling rdregion
operations with widths crossing register boundaries in the Vector Compiler (VC).
Corrected the maximum sub slices value in SIP.
Updated the execution mask to 32 bit on Xe2.
Corrected the sample_d_c
and sample_d_c_mlod
sampler message type.
Migrated ProgramScopeConstantResolution
and StatelessToStateful
to fix opaque pointers issues.
Disabled building legacy SPIR-V Reader.
Updated CopyVariableRaw
to use SIMD32
as the maximum Single Instruction Multiple Data (SIMD) size on Xe2.
Deprecated the initial set of generation 9 Vector Compiler (VC) LITs, replacing them with XeHPG equivalents.
Made code assumption in get_global_id
optional.
Extended the application of the multiplication pattern in GetElementPtr
Loop Strength Reduction (LSR) pass to improve performance.
Added options for controlling the depressurizer thresholds.
Removed too strict restrictions from the LICM pass.
Restored the SIMD16 drop functionality on Xe2, enabling support for spilling kernels using SIMD16 on this architecture.
Updated the COMPUTE_WALKER
to fix incorrect RawData
array length.
Added the FillImage1dBuffer
built-in kernel.
Started blocking zeContextMakeImageResident
.
Started failing device initialization if kernel debugging is misconfigured, with a detailed error message printed to stderr
.
Started passing the Deallocate2
callback to the Graphics Memory Manager (GMM).
Corrected the Xe sysfs
paths for the Compute Command Streamer (CCS) mode setting.
Made external semaphore controller thread-safe and ensured proxy events are destroyed only when the semaphore thread controller releases resources to prevent sporadic failures.
Improved ULLS light ring handling by managing new ring buffer residency, extending mutex protection for safe stop
operations, and updating USM cleaner to properly stop ULLS light during resource eviction.
Ensured Zebin is dumped during program build when unpackSingleDeviceBinary
is not called, provided the debug key is enabled.
Corrected gfx_core_helper definitions for EUSS.
Correct logic for retrieving valid timestamp bits.
Removed overflow check in calculations for Xe2+ cores using EUSS.
Improved media engine handling
Started returning an error via paraminfo
if it is queried with a parameter count of 0, but the programmable actually has one or more parameters.
Started patching payload arguments in inline data in case of indirect kernels.
Ensured payload arguments are patched before fetching the walker command.
Added initial support for single temporary allocations list and ensured flush of split task count.
Replaced sfence
with mfence
on discrete devices and moved ULLS semaphore to shared memory on Xe2.
Disabled deferring Memory Object Control State (MOCS) on WSL for Lunar Lake and unified deferring MOCS to the Page Allocation Table (PAT).
Updated implementation to expose the THREAD_SCRATCH
debug register only when running in the heapless mode.
Improved cache handling by invalidating texture and heap caches before reuse or image reads.
Improved compression handling by disabling it for pre-Xe2 platforms, enforcing capability table flags, and removing a redundant workaround for Alchemist GPUs.
Added input/output control helper for context destruction.
Changed the stype
member type in Level Zero Core and Tools driver extensions to uint32_t-alias
to prevent casting outside the ze_structure_type_t
/zet_structure_type_t
enum range
Improved fence allocation and synchronization by passing product helper to isFenceAllocationRequired
, using global fence helper, simplifying fence selection in ULLS, and removing global fence from CW post-synchronization on Battlemage.
Improve container management by reserving residency before addition, unifying non-append method calls, and preventing queue buffer consumption during command list execution.
Enabled staging infrastructure for 3D images.
Updated semaphoreBuffer
and ringBuffer
usage on integrated devices.
Improved the NonCopyableOrMovable
and NonCopyable
concepts.
Unified the local memory size getter for i915 and Xe.
Set the vmbind
user fence in makeMemoryResident
to resolve the memory reporting issue.
Corrected the allocation size in freeSVMAlloc
to prevent crashes.
Added an option to enable and disable the heapless mode in the OpenCL offline compiler.
Parsed the Compute Command Streamer (CCS) mode setting for platforms other than Intel Data Center GPU Max Series.
Implemented per-element BLT copying for tiled 1D arrays and began treating tiled 1D images as 2D with a height of 1 for BLT operations.
Stopped enabling compression on xe_lpg
for Linux and WSL.
Corrected blit properties for CL_MEM_OBJECT_IMAGE1D_BUFFER
images.
Corrected the ZE_MEMORY_ACCESS_CAP_FLAG_CONCURRENT
reporting.
Enabled the Unified Shared Memory (USM) compression on Linux.
Aligned allocation sizes of 2MB or larger in local memory to a 2MB boundary. This behavior is controlled by the is2MBLocalMemAlignmentEnabled
function. Additionally, implemented a pool allocator for gpuTimestampDeviceBuffer
. Allocations are shared per device and controlled via the EnableTimestampPoolAllocator
debug flag.
Corrected allocation in MemObj::getMemObjectInfo
and created graphicsAllocation
per each rootDevice
.
Corrected Level Zero versioning.
Corrected a Device IDs mismatch.
Started passing the ReadOnly
flag only for page-misaligned input pointers.
Corrected the returned metric value counter for EU stall.
Fixed an issue causing memory allocation crash.
Fixed a shared memory failure issue.
Fixed a hang issue in the Toolbox Interface (TBX) page fault manager.
Implemented changes to prevent race conditions during resource eviction.
Configured scratch pages for the debugger.
Corrected allocation handling for the increment Command Buffer (CB) event.
Enabled support for image array types with an array size of 1 on Xe2 and later platforms.
Corrected the order of passing arguments to obtainCommandStream
.
Added a check for the Shared Virtual Memory (SVM) allocated host pointer in clCreateBuffer
.
Merged preliminary and non-preliminary code for the Legacy Sysman Memory module.
Removed the patchtoken
fallback.
Corrected the error code for the deprecated clSetCommadQueueProperties
.
Removed operation access for unsupported types.
Fixed an issue with setting up the Compute Command Streamer (CCS) mode.
Corrected literal raw strings handling in the printf
formatter.
Added support for MetricCreateFromProgrammableExp2
.
Corrected DSH generation and programming of inline samplers with bindless addressing in Level Zero.
Started passing the root device when creating secondary contexts to ensure proper initialization of gfxCoreHelper
in Direct Rendering Manager (DRM).
Enabled 2-way coherency for misaligned user memory.
Preserved the allocation type for memory objects.
Added support for passing -device
and -device_options
in multiple formats in the OpenCL Offline Compiler.
Fixed the status return value in getExternalMemoryProperties
when operating in the Toolbox Interface (TBX) mode.
Removed the deprecated LayoutRight
Graphics Memory Manager (GMM) flag.
Introduced the ImageSurfaceState
helper class and relocated global functions into the class to reduce compilation time.
Added the Sysman device directory name as a parameter to SysmanKmdInterface
.
Set the external semaphore version in Level Zero.
Fixed an issue where the sched_setattr_nocheck
API was not exported in kernel versions earlier than 5.14.
Switched to locked variant of wake_up_interruptible
for safer thread wake-ups.
Reduced spurious wake-ups for single-task shmem/userptr
jobs.
Started propagating wake-up from suspended threads to avoid delayed task execution.
Replaced function type casting with typed function stubs.
Added a reference around vm_bind
to maintain the Virtual Memory Areaâs (VMA) validity.
Started clearing the Multi Die Fabric Interconnect (MDFI) boot time errors, as they are expected during the initialization of MDFI fabric and may be confused with runtime errors.
Started using the kobject
attribute instead of the device
attribute for num_cslices
and ccs_mode
sysfs entries on RHEL 8.X.
Started handling additional PCI AER corner cases to be able to reset devices without locking up the machine.
Reordered hardware waits and GPU reset logic during PCI faults to avoid blocking on unresponsive hardware while recovering from a hardware failure.
Fixed an issue where a mutex could be held indefinitely when attempting to remove an idle Virtual Memory Area (VMA) from the VM.
Prevented memory allocations during page faults triggered by GPU reset.
Updated GTT_MMAP_VERSION
to align with corresponding changes in user space.
Allowed data to be discarded on forced unbinds, avoiding swaps to inaccessible system memory.
Set the lmem_offset
to 0 after use so that the next local memory block does not carry the same offset leading to lost data during Single Root I/O Virtualization (SR-IOV) migrations.
Fixed incorrect annotations.
Fixed error unwinding in i915_virtualization_probe
.
Added periodic checks for forward progress by monitoring context switches and user interrupts. If the same context remains active without interrupts since the last check, a warning is generated with no further action.
Prevented default context creation when wedged.
Cleaned up faulting initialization.
Prevented DPC NPD after initialization failure by early iaf
setup and driver-device decoupling on probe failure.
Started protecting per-CPU px_cache
from interrupts.
Started sending a TLB invalidation request after each Virtual Memory Area (VMA) binding for GuC use, instead of deferring until before enabling GuC, to prevent Single Root I/O Virtualization (SR-IOV) failures.
Started periodic check for mmio failures.
Started handling CT fault injection during early initialization by ensuring CT descriptor objects are not dereferenced before assignment, preventing failures on early faults.
Started checking for context creation failure during execbuf
.
Added support for deferred context attachment to existing clients.
Removed the residual calls to the empty i915_oa_init_reg_state
to completely excise an old use-after-free.
Skipped the HuC authentication register check as it is no longer needed.
Prevented soft lockup during defragmentation on eviction.
Prevented a potential compute hang on Alchemist GPUs.
Updated CT desc->head
after consuming a receive chunk to prevent buffer overflow and slow GuC messaging.
Added device PCI IDs to GPU dumps.
Updated ce->vm
on parallel child contexts.
Corrected the CSC hardware errors.
Added the eudbg
event for deferred default context allocation.
Removed lockdep assertions around Global Graphics Translation Table (GGTT) updates to prevent conflicts.
Preserved Translation Lookaside Buffer (TLB) seqno
when splitting clear pages into multiple smaller pages if there is an outstanding TLB invalidation for those pages.
Deferred the default context allocation until first use, reducing overhead when a device opens.
Disabled compression for the GMM_FORMAT_I420
format.
Added size validation when checking NoOptimizationPadding
.
Enforced the Tile4
layout over Linear
for flipchain
resources.
Resolved type incompatibility issues.
Reduced hardware register polling timeout in EFI.
Refactored HECI_DEVICE_KIND
handling in EFI for better maintainabilit.
Fixed an EFI issue by changing the propertyMap
array type to CHAR8*
.
Fixed EFI compilation errors with GCC.
Fixed incorrect bitfield parsing in metric equations.
Corrected scaling of std.color.node_pbe_arb
metric on Lunar Lake.
Optimized memory allocation size to improve performance and reduce overhead.
Updated the OpenIoStream
behavior to return CC_ERROR_NOT_SUPPORTED
when processId != 0
.
Removed legacy global symbols to reduce namespace clutter and improve maintainability.
Fixed Sysman-only initialization to prevent retrieval of the loader context when version compatibility is not met.
Corrected version and GUID updates for version 1.22.2.
Fixed GUID generation and updated to version 1.22.3.
Resolved an issue in zesInit
to correctly initialize the requested API version.
Fixed artifact upload workflow.
Fixed the extension validation logic.
Improved initialization error checking to verify validation layer behavior.
Fixed experimental extension validation to accept unknown extensions within a valid range.
Corrected sType
initialization in property query operations.
Improved teardown checks to prevent invalid context usage.
Added the missing header to ze_ddi_common.h
.
Fixed enabling the Digital Display Interface (DDI) handle extensions.
Fixed the incorrect sType
assignment in zello_world
.
Modified context_t
to always allocate dynamically and support delayed destruction.
Updated the model used in the interop example to a vehicle detection model.
Fixed the BUILD_EXAMPLES
build option so it no longer depends on INSTALL_DEV
to take effect.
Removed outdated Docker files provided with examples.
Resolved crash issues during AV1, AVC, and HEVC decoding related to surface creation on resolution changes.
Intel® XPU Manager and XPU System Management Interface ïImproved the firmware update under the recovery mode for Intel® Data Center GPU Flex Series.
Introduced security enhancements.
The 2507.17 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5
Ubuntu 22.04 and 24.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
Updated the Graphics Micro Controller (GuC) to version 70.44.1.
Resolved a hang detection issue on Intel Data Center GPU Max Series by re-enabling GPU hang checks. Hang detection now only logs a warning message without terminating the application.
Updated the Graphics Micro Controller (GuC) to version 70.44.1.
2025-03-18ïThe 2507.12 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5
Ubuntu 22.04 and 24.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
Incorporated the latest security updates to address recent vulnerabilities, enhance protection, and ensure greater system reliability.
Intel® Graphics Driver Backports for Linux* OS (i915) ïAdded support for the HBM_REPLACE
bit to signal High Bandwidth Memory (HBM) health status and its transition to the REPLACE
state. This enhancement enables the driver to detect the bit and prevent loading when the state changes to REPLACE
, while also reporting the issue and prompting HBM replacement.
Started handling page fault events in the Xe debugger.
Added support for the cl_khr_expect_assume
OpenCL extension that introduces mechanisms to supply the compiler with information that can enhance the performance of certain kernels.
Implemented the Level Zero zeKernelGetBinaryExp
API that allows retrieving kernel binary program data.
Added support for shared system Unified Shared Memory (USM) allocation in appendLaunchKernel
.
Implemented enhancements to the Unified Shared Memory (USM) reuse mechanism, including the introduction of a USM reuse cleaner that efficiently manages system and local memory across different reuse strategies, as well as an extension of the USM reuse limit infrastructure.
Improved cache management by supporting whitelisted includes.
Added support for handling new Reliability, Availability, and Serviceability (RAS) errors in Sysman.
Implemented alignment of host Unified Shared Memory (USM) to 2MB on discrete devices when the allocated size exceeds 2MB.
Modified the pass threshold to optimize the i64 multiplication performance.
Introduced Panther Lake support.
Improved vectorizer to support vector emission of ftrunc
instructions.
Enabled the IndVarSimplification
pass to improve performance.
Enabled access to the Workload Management and Thread Programming (WMTP) SIP kernel for the Xe3 core and introduced a default WMTP SIP configuration per Shared Local Memory (SLM) for Xe2.
Added more aggressive late rescheduling phase to the CodeLoopSinking
pass and an option to disable the maximum sinking heuristic in the presence of 2D block reads.
Improved the InlineHelper
LLVM utility.
Implemented the MergeAllocas
pass and enabled allocation merging prior to the split asynchronous pass.
Enabled the emission of vectorized floating-point addition (FADD) instructions, allowing the VISA emitter to process them efficiently.
Implemented nested 3D resource loop unrolling.
Upgraded specification to version 1.12.15.
Intel® Video Processing Library ïIntroduced support for Intel® VPL API 2.14, introducing new quality and speed settings for AI-powered video frame interpolation. This update also expands algorithm and mode selection options for AI-based super resolution and adds support for High Efficiency Video Coding (HEVC) level 8.5 decoding.
Improved compatibility with Python 3.12 development environments.
Integrated screen content coding tools for AV1 into sample_encode
.
Added a new GTK renderer option to sample_decode
and sample_multi_transcode
.
Introduced a new -fullscreen
option for GTK in sample_decode
and sample_multi_transcode
. Users can now toggle full screen using Ctrl+f
and exit with Esc
.
Enhanced support for Python 3.12 development environments.
Updated the signing key for KMD prebuilds to enhance security and ensure continued reliability. The new key, valid for one year, will be used to sign all new releases. To ensure compatibility with these updates while maintaining the secure boot functionality, you need to download and install a new Distinguished Encoding Rules (DER) certificate.
Intel® Graphics Compiler ïLowered bfloat ceil
and floor
intrinsics.
Refactored parameters in vc-lits in lit-config
for LLVM 16 to not link the initializeGenX
function.
Increased the early recompilation threshold for default General Register File (GRF) to 500.
Enabled the EnableWaveShuffleIndexSinking
registry key by default.
Enabled the WaveAllJointReduction
pass by default.
Added an extra assertion check to the SIMDInfo
offset.
Introduced page fault handling improvements.
Fixed an issue causing the CSC hardware errors.
Removed unnecessary lockdep
debugging checks from Global Graphics Translation Table (GGTT) updates.
Fixed timeout issues by preserving Translation Lookaside Buffer (TLB) seqno
when splitting clear pages.
Fixed issues causing compilation errors on kernel 6.6 and later.
Fixed an issue where prefetch was attempted on empty objects.
Fixed an issue where pid_task()
could fail if the target process had already exited.
Implemented a workaround for Address Translation Services for Memory (ATS-M) and introduced support for G8 power state to reduce idle power consumption.
Modified the logic to avoid calling pm_qos_request
a second time on an existing request during breadcrumb reset.
Disabled C-states for breadcrumb interrupts to reduce Direct Memory Access (DMA) latency.
Cleaned up incomplete shmemfs
obj->base.filp
on failed swapout.
Hardcoded memory health status in sysfs
to prevent breakage.
Implemented flushing of freed objects before reporting available memory to stabilize the reported memory levels.
Modified implementation to retry eviction only when it is blocked by active or locked objects, aiming to reduce response time.
Optimized Virtual Memory Area (VMA) prefetch by short-circuiting redundant operations.
Corrected Compressed Color Surface (CCS) copies for Single Root I/O Virtualization (SR-IOV) save
and restore
.
Restricted shmem
flags to a valid set for swapin
to resolve a page fault issue.
Modified the implementation to repeat the Translation Lookaside Buffer (TLB) flush invalidation request, resolving the issue with the failing Hardware Performance Library (HPL).
Removed early unlocked unbind from object free to avoid race conditions between lockless unbinding and eviction of non-persistent VMAs.
Introduced changes to protect i915_drm_client_fini
from early shutdown.
Started supporting compilation with CONFIG_PAGE_TABLE_ISOLATION
to fix a compilation issue on RHEL.
Optimized the unbind step in the GT IFR flow by skipping context runtime updates when the device is quiesced. This change reduces the execution time.
Disabled implicit callback conversion for wait events to resolve the wait operation hang issues.
Added the missing callback event cache flush to fix an issue with the zeEventHostSynchronize
hangs.
Fixed an issue with reporting EU counts for multi-slice platforms.
Fixed an issue where ZE_AFFINITY_MASK
was not working when ZE_FLAT_DEVICE_HIERARCHY
was set to COMBINED
in OpenCL.
Implemented shared allocations to preserve reference timestamps and introduced a flag in the Inter-Process Communication (IPC) pool data to verify if the mapped timestamp flag is set.
Fixed an issue where event_profiling::command_start
returned an incorrect result.
Set stateless addressing mode for buffers that are neither bindful nor bindless.
Started retrieving the minimal offset size for region barrier.
Fixed the scope of the result variable in initDriver
to resolve an issue where it was defined in a narrower scope, causing the initialization result to be improperly discarded.
Started returning rawDataSize
as zero when the readIoStream
call fails.
Resolved issues with parsing and setting the Level Zero debugger bitmask.
Fixed performance issues on Battlemage GPUs.
Ensured memory residency by setting the vmbind
user fence when making memory resident.
Prevented crashes due to over-allocation by introducing a defer backing flag to Graphics Execution Manager (GEM) create input/output control, ensuring memory is resident before locking.
Fixed regression chart dump for General Register File (GRF) configurations with more than 128 registers.
Resolved issues in the vectorizer, improving its stability and performance.
Disabled the TrivialLocalMemoryOpsElimination
pass from the pipeline.
Resized G4_Declare
âs row size for atomic operations to prevent out-of-bounds (OOB) issues.
Fixed issues in VISA parser for Load Store Cache (LSC) 2D block operations to allow mixed register and immediate AddrX
and AddrY
operands for 2D block load and store instructions.
Fixed value tracker handling of Global Element Pointers (GEP) with zero indices by treating them as bitcasts to prevent confusion in kernel usage.
Implemented dynamic optimization threshold adjustment for the depressurizer based on the number and size of General Register File (GRF) registers.
Fixed incorrect instructions placement in the rollback functionality of the CodeLoopSinking
pass.
Stopped using the %sp
and %fp
predefined variables.
Implemented dedicated logic for handling discards in DynamicRayManagementPass
to prevent crashes.
Fixed the direct address destination restriction on SIMD32
.
Improved the alignment calculation in constant coalescing and started supporting additional load and store intrinsics in SynchronizationObjectCoalescing
.
Fixed incorrect condition check in isRegionInvariant
of WIAnalysis
.
Stopped removing Built-in Function (BiF) module prebuilt stamp files to avoid redundant recompilations when CMake files are updated.
Fixed an issue with constant folding prevention inside loops.
Added intrinsic cache to KernelDebugInfo
and prevented indirect access to Software Scoreboard (SWSB).
Reduced the number of atomics hitting the same cache line by performing atomic predication.
Enabled the Execution Out-of-Order Thread (EOT) to participate in the Software Scoreboard (SWSB) token assignment.
Fixed a channel mask issue in the src0
length for RenderTargetDataPayload
, where the Alpha channel was incorrectly controlled.
Addressed and fixed potential memory leaks.
Fixed issues in the generation of pkg-config
files.
Corrected code generation for libddi
table queries.
Corrected validation layerâs parameter checker for extensions.
Fixed the bootstrap process to support Debian distributions that do not define the ID_LIKE
property.
Fixed the bootstrap process to support Debian distributions that do not define the ID_LIKE
property.
The 2506.18 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5
Ubuntu 22.04 and 24.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
Introduced support for RHEL 9.5.
Incorporated the latest security updates to address recent vulnerabilities, enhance protection, and ensure greater system reliability.
Developed the SPV_INTEL_subgroup_matrix_multiply_accumulate
SPIR-V extension to enable support for DPAS operations at a lower abstraction level compared to the joint matrix.
Introduced support for the SPV_INTEL_maximum_registers
SPIR-V extension that adds literal-based and ID-based execution modes for specifying the maximum number of registers for an entry point.
Added support for a 3-channel image format to SYCL bindless images to enable access to the bindless texture hardware.
Modified the vectorizer to support vector emission of fmul
instructions.
Implemented VF engine utilization API.
Added the input
and output
control helper functions to mmap
and unmap
operations, acquire and release the GPU range, allocate user pointer, and synchronize the userptr
allocation.
Exposed new counter-based events and added the default mode for zexCounterBasedEventCreate2
.
Introduced support for physical host memory.
Updated Level Zero metrics to align with v1.11 headers.
Started specifying the cache level when reserving a region.
Added GPU and memory power domain support for getEnergyCounter
.
Added support for three channels in Level Zero.
Introduced support for zeInitDrivers
that combines driver initialization and retrieval functionality. Updated the GTPIN
initialization logic to execute only when pCount
is greater than 0 and the driver handle is non-null. Additionally, removed the unused ze_init_flags_t
flag from all driverInit
functions.
Enabled counter-based allocation peer sharing to support scenarios involving in-order command lists with multi-GPU event scenarios.
Added support for two Xe-eudebug interfaces within a single binary. The new EuDebugInterface
class encapsulates eudebug functionality, with CMake flags to control Xe-eudebug and prelim uAPI support.
Added a root device flag check for multi-device scenarios, so that APIs using root device handles can now validate this flag and handle failures gracefully.
Added a new uAPI macro in the engine module to fetch the configuration of the total ticks.
Added Process Maturity Table (PMT) counter offset values for Battlemage.
Enabled IsCpuCacheable
on Linux to improve performance.
Enabled the R10G10B10_XR_BIAS_A2_UNORM
format for display to support 10-bit color and HDR rendering with improved visual quality.
Added the Media Video Processing (VP) performance tags that can help with optimization and debugging.
Enabled the logging of data traffic in the trace mode.
Intel® ME TEE Library ïAdded a 32-bit release preset in CMake.
Added getters for maxMsgLen
and protocolVer
.
Introduced upstream encoding support for Battlemage.
Added support for AV1 encoding with ARGB input.
Added support for half-full Observability Architecture (OA) buffer interrupt in i915.
Added CoreFrequencyMHz
details and MaxCount
global symbols for Xe2 and Xe3 platforms.
Added support for GpuCoreClocks
symbols in read equations.
Introduced a global symbol that indicates GPU frequency override state.
Added return code handling for the following functions: AddInformationSet
, SetSnapshotReportReadEquation
, SetSnapshotReportReadEquation
, SetOverflowFunction
, AddDefaultMetrics
, AddStartRegisterSet
, CreateMetricsFromPrototypes
, and RefreshConfigRegisters
Added event deadlock detection within the validation layer.
Started logging the full path of loaded libraries in traces for better debugging.
Added result passing to validation checkers at the epilogue stage.
Replaced deprecated i64 calls to llvm.smax
, llvm.smin
, llvm.umax
, and llvm.umin
with icmp
and select
.
Migrated PrivateMemoryUsageAnalysis
to opaque pointers. The original pass used getNonOpaquePtrEltTy
to get the element type of pointer arguments. The new approach examines the uses of each pointer argument to see if they interact with structure types.
Replaced deprecated std::is_pod
with updated type traits.
Restored the fptrunc
functionality to the vectorizer.
Adjusted the depth configuration for all Xe2+ platforms and moved the depth limitation from the release helper to image_hw
.
Started handling DRM_XE_TOPO_SIMD16_EU_PER_DSS
in Xe non-preliminary path.
Simplified CacheRegion
reservation tracking by replacing the dynamic unordered_map
with a static array, leveraging the small, known maximum number of reservations and unique CacheRegion values. Additionally, added helper code for array-indexing using the CacheRegion
enum and started using a level-specific name for CacheInfo
instances.
Enabled programmable metrics by default.
Started calling flushMonitorFence
on Blitter Command Streamer (BCS) Control and Status Register (CSR) and ensured the global fence is always resident. Additionally, enabled Ultra Low Latency Scheduling (ULLS) on the copy engine for Battlemage.
Separated power handle creation from power limit support.
Started reusing staging buffers from other Control and Status Registers (CSR).
Standardized naming and structure alignment across multiple components, such as DESTINATION_SURFACE_TYPE
, CFE_STATE
, STATE_BASE_ADDRESS
, thread group batch size
, DISPATCH_WALKER
, and RENDER_SURFACE_STATE
, to conform to the latest specifications, with optimizations such as extracted PostSyncType
for simplified integration.
Started using sg
upper bound for incrementing partial maps.
Serialized Page Table Entry (PTE) updates for BLT offload and enhanced error handling for BLT clear submissions, along with the serialization of hardware fences.
Removed all refresh metric sets for Alchemist G10 and G11.
Discontinued media support for Alchemist.
Removed unsupported media API masks.
Moved platform versions to the platform index map.
Updated Direct Rendering Manager (DRM) headers to version prelim v2.0-rc27.
Updated Direct Rendering Manager (DRM) headers to version prelim v2.0-rc27.
Improvementsï Intel® Graphics Compiler ïIntroduced the generate_local_id
flag for some User Mode Driver (UMD) use cases to resolve performance issues.
Introduced ray tracing and registry flag improvements to enable Branch Target Determination (BTD) for synchronized dispatch rays.
Fixed memory leak in BiFManager
.
Fixed an issue causing incorrect GPU results after passing a null pointer to the GPU before usage. The fix introduces the BufferBoundsChecking
and MinimumValidAddress
flags to the release mode and supports handling generic address space in the MinimumValidAddressChecking
pass.
Added missing intrinsic functions to the TypesLegalizationPass
.
Fixed an issue where fast math flags were unavailable in some scenarios in the ScalarizeFunction
pass.
Fixed an issue that caused metadata to be unavailable in certain scenarios in the ScalarizeFunction
pass.
Fixed incorrect Intermediate Representation (IR) after rollback application in the vector shuffle rescheduling CodeLoopSinking
pass. Additionally, weaken the conditions for creating a candidate to reenable vector shuffle scheduling.
Replaced functions that used joint matrix arguments with updated functions to ensure proper cleanup.
Added scalarization for the fshl
operation to fix an issue where the process of swapping the higher and lower 32-bit sections of 64-bit data in vectors was not working correctly.
Fixed an issue with incorrect Virtual Instrument Software Architecture (VISA) preemption option.
Added ray tracing intrinsic payload accessor for updating payload data to optimize ray tracing stack footprint.
Optimized WaveShuffleIndex
sinking.
Fixed an issue where CustomLoopOpt
did not ensure the floating point type, which caused type mismatch.
Fixed indirect access detection for divergent execution paths.
Fixed a crash that occurred during shader compilation by ensuring the coalescing engine correctly marks the start of the payloadâs lifetime.
Extended unroll optimization to new core to improve performance.
Started emitting DW_OP_stack_value
only at the end of SIMD32 expressions to resolve a nullptr
dereference issue. The implementation adapts debugging information to handle SIMD32-to-SIMD16 splits, marking the merge point in IGC::DbgVariable
.
Fixed an issue with dst
and src
overlapping in HWConformity
.
Merged multiple consecutive WaveAll
operations into a joint reduction tree to optimize joint reduction.
Created a new optimized pattern for WaveShuffle
.
Improved getGenISARange
in DebugInfoPass
to significantly reduce execution time by streamlining instruction iteration and minimizing map lookups.
Extended the capability of wavePrefix
to improve performance.
Enabled the EnableGEPLSRMulExpr
flag by default to fix performance issues.
Fixed accumulator registers save
and restore
syntax in Xe2 SIP.
Modified Scalar Evolution (SCEV) analysis responsible for cashing ZExt
expressions to improve the compilation time and performance.
Introduced a new metadata field for disabling the memory fence optimization that treats Unified Global Memory (UGM) fences as synchronizing Shared Local Memory (SLM). This change fixes performance issues.
Increased the available per-thread scratch size by removing the unused maxPerThreadScratchSpace
method from DriverInfo
.
Fixed an issue where the debug registry keys were incorrectly marked.
Added the missing registry key check for the EnableGVN
key.
Resolved issues in shader debugging code.
Fixed accumulator registers save
and restore
syntax in Xe2 SIP.
Fixed the kernel argument alignment issues.
Improved user space notification for page fault by enabling event synchronization GPU status checks by default and adding option to force GPU status checks via a dedicated key.
Fixed an issue with duplicating calculations when printing timestamps was enabled. The fix adds the PrintCalculatedTimestamps
debug flags for printing timestamps in level zero paths. PrintTimestampPacketContents
adds logging for Level Zero paths and ForceUseOnlyGlobalTimestamps
forces the use of a global timestamp.
Added a debug flag to override region count.
Started clearing standalone timestamps prior to submission.
Enabled the Tile64
optimization flag to fix functional issues.
Enabled zesInit
for new platforms when a legacy path is initialized.
Started calculating stack count for synchronization render target to fix functional issues that blocked SIMD32 compilation of syncRT
shaders. Additionally, removed the number of render target stacks from the capability table.
Ensured the appropriate Control and Status Register (CSR) is selected for submission when copy offload is not allowed.
Resolved an issue where an incorrect timestamp was returned. Now, when EnableGlobalTimestampViaSubmission
is set, zeDeviceGetGlobalTimestamp
uses the immediate command submission method to get the correct GPU time.
Added a debug breakpoint to handle eviction failures, preventing devices from entering an error state. Additionally, removed destroyed allocations from eviction lists.
Aligned thread group count to DSS on all platforms.
Stopped including performance counters in timestamp profiling to improve performance.
Fixed an issue where OpenCL did not expose tiles as devices with combined hierarchy.
Added the missing AUB polls on synchronization points.
Started checking standalone Command Buffer (CB) event completion for profiling to fix an issue with performance checks.
Fixed reporting the number of Xe cores per cluster.
Appended the recorded command list into immediate to resolve relaxed ordering and stalling command checks. Additionally, added an immediate command list append
API to the reported extensions list.
Eliminated an overhead caused by using the submission method for zeDeviceGetGlobalTimestamps
.
Enabled immediate binding for make resident to fix memory reporting issues.
Fixed 64-bit row and slice pitch for built-ins in the Level Zero heapless mode.
Fixed the discrepancy of implicit arguments buffer allocation and programming in the OpenCL path.
Fixed an issue with mirroring the module debug area write across tiles on Xe.
Adjusted limiting device Unified Shared Memory (USM) reuse and stopped reserving vector for allocation information when reuse is disabled.
Fixed an issue with PCI and memory timestamp units in the Process Maturity Table (PMT) telemetry.
Started tracking Unified Shared Memory (USM) reuse usage when multiple cl contexts are used to fix memory leak.
Started assigning Control and Status Register (CSR) once for the staging image write, instead of assigning it separately for each sub-copy.
Started using a release helper to get the correct ftrXe2Compression
value and disabled the Xe2 compression through the release helper.
Added a micro second resolution for timeout to fix a camera driver compatibility issue.
Started checking for nullptr
before dereferencing.
Optimized bind information in the input/output control helper Xe to store only the userptr
values and their corresponding GPU virtual addresses.
Fixed an issue where the returned number of VF engine statistics was incorrect. Additionally, implemented a check for local memory removal during the VF handle creation.
Started gracefully handling cases where the SIP version header is greater than 3.
Improved the behavior of notifyNReports
.
Started allocating resources by KMD on Battlemage to improve performance.
Introduced staging reads and enabled image writes through staging chunks for improved performance.
Improved Xe2 allocation with KMD, enhanced inOrder
counter signaling via pipe control during data cache flush for immediate command lists, and added debug flags for event signal visibility, including AbortHostSyncOnNonHostVisibleEvent
and ForceHostSignalScope
for host event synchronization management.
Enabled direct submission on Battlemage.
Enabled copy through staging buffers on Xe2 and timestamp reuse.
Resolved the thundering herd problem in ct_receive
by waking only the specific receiving process through ct_request
. This prevents waking unrelated processes and avoids inefficient iteration, especially during concurrent page faults.
Resolved issues related to map_pages()
and iotlb_sync_map()
functions.
Implemented changes to ensure that all blocking send operations are awakened and canceled if completion tracking (CT) fences are disabled during an ongoing send operation.
Fixed an issue causing node hangs when applications were profiled using VTune. The issue was addressed by initializing chunk->policy
for shmem
allocations.
Changed the intel_fbdev_restore_mode
return type from void
to int
to meet the fbdev
client registration API requirement introduced in kernel 6.12.
Fixed a node reboot issue that occurred due to a general protection fault. The issue was addressed by protecting the acquisition of ce->timeline
in signal_irq_work
.
Deferred ct_receive
from the ct_send_nb
path to prevent deadlock caused by calling handlers under spinlocks. The patch removes ct_receive
from the non-blocking send path to reduce latency, allowing the caller to handle scheduling of ct_receive
for backlog clearing.
Enabled backport support for 6.12 kernel.
Added a new media compression mode to resolve assertion issues.
Adjusted the BaseWidth
to improve support of the RGB24 format.
Updated the reserved Page Attribute Table (PAT) index of the cache element to optimize cache behavior.
Added missing driver logic for fixed clients to the Unified Extensible Firmware Interface (UEFI).
Intel® Media Driver for VAAPI ïFixed AV1 decoding corruption caused by invalid reference frames.
Fixed an AVC decoding hang issue that occurred when the output surface buffer was insufficient.
Fixed an AV1 Bitrate Control (BRC) encoding mismatch issue by ensuring the correct frame type.
Fixed an AV1 multi-tile group BRC encoding issue.
Fixed a page fault issue in AV1 encoding related to the macroblock coded buffer.
Fixed corruption in UYVY to RGB32 color space conversion (CSC) output.
Corrected an incorrect RGB mask order in video processing.
Fixed an R8G8 resource allocation failure.
Corrected performance capabilities in alignment with i915 performance revision.
Resolved a copy engine count issue.
Fixed truncation of symbol names in equations by enabling dynamic allocation instead of the previous fixed size of 32 characters.
Optimized the size of global symbol byte arrays.
Started removing the static result in InitDrivers
when the first initialization fails.
Switched to using relative paths for event deadlock detection in third-party headers.
Disconnected zeInitDrivers
and zeDriverGet
.
Addressed issues with backward compatibility regarding Get*ProcAddrTable
usage.
Added the missing zeKernelGetExp
API and header updates.
Fixed zeInit
compatibility when zeInitDrivers
is undefined.
Fixed an AV1 decoding issue that caused frame synchronization errors and corruption.
Fixed a VP9 encoding issue that led to corruption on consecutive key frames.
Improved video processing for improved composition output quality.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4