Skip to content

Conversation

@konflux-internal-p02
Copy link

@konflux-internal-p02 konflux-internal-p02 bot commented Nov 7, 2025

This PR contains the following updates:

Package Change Age Confidence
torch == 2.7.0 -> ==2.9.1 age confidence

Release Notes

pytorch/pytorch (torch)

v2.9.1: PyTorch 2.9.1 Release, bug fix release

Compare Source

This release is meant to fix the following issues (regressions / silent correctness):

Tracked Regressions

Significant Memory Regression in F.conv3d with bfloat16 Inputs in PyTorch 2.9.0 (#​166643)
This release provides work around this issue. If you are impacted please install nvidia-cudnn package version 9.15+ from pypi. (#​166480) (#​167111)

Torch.compile

Fix Inductor bug when compiling Gemma (#​165601)
Fix InternalTorchDynamoError in bytecode_transformation (#​166036)
Fix silent correctness error_on_graph_break bug where non-empty checkpoint results in unwanted graph break resumption (#​166586)
Improve performance by avoiding recompilation with mark_static_address with cudagraphs (#​162208)
Improve performance by caching get_free_symbol_uses in torch inductor (#​166338)
Fix fix registration design for inductor graph partition for vLLM (#​166458) (#​165815) (#​165514)
Fix warning spamming in torch.compile (#​166993)
Fix exception related to uninitialized tracer_output variable (#​163169)
Fix crash in torch.bmm and torch.compile with PyTorch release 2.9.0 (#​166457)

Other

Fix warning spamming on new APIs to control TF32 behavior (#​166956)
Fix distributed crash with non-contiguous gather inputs (#​166181)
Fix indexing on large tensor causes invalid configuration argument (#​166974)
Fix numeric issue in CUDNN_ATTENTION (#​166912) (#​166570)
Fix symmetric memory issue with fused_scaled_matmul_reduce_scatter (#​165086)
Improve libtorch stable ABI documentation (#​163899)
Fix image display on pypi project description section (#​166404)

v2.9.0: 2.9 Release Notes

Compare Source

PyTorch 2.9.0 Release Notes

Highlights

Unstable (API-Unstable)
Updates to the stable libtorch ABI for third-party C++/CUDA extensions
Symmetric memory that enables easy programming of multi-GPU kernels
The ability to arbitrarily toggle error or resume on graph breaks in torch.compile
Expanded wheel variant support to include ROCm, XPU and CUDA 13
FlexAttention enablement on Intel GPUs
Flash decoding optimization based on FlexAttention on X86 CPU
ARM Platform improvements and optimizations
Enablement of Linux aarch64 binary wheel builds across all supported CUDA versions

For more details about these highlighted features, you can look at the release blogpost. Below are the full release notes for this release.

Backwards Incompatible Changes

Min supported Python version is now 3.10 (#​162310)

The minimum version of Python required for PyTorch 2.9.0 is 3.10. We also have 3.14 and 3.14t available as preview with this release.

Undefined behavior when an output of a custom operator shares storage with an input

This is a reminder that outputs of PyTorch custom operators (that are registered using the torch.library or TORCH_LIBRARY APIs) are not allowed to return Tensors that share storage with input tensors. The violation of this condition leads to undefined behavior: sometimes the result will be correct, sometimes it will be garbage.

After #​163227, custom operators that violated this condition that previously returned correct results under torch.compile may now return silently incorrect results under torch.compile. Because this is changing the behavior of undefined behavior, we do not consider this to be a bug, but we are still documenting it in this section as a "potentially unexpected behavior change".

This is one of the conditions checked for by torch.library.opcheck and is mentioned in The Custom Operators Manual

More details

Outputs of PyTorch custom operators are not allowed to return Tensors that share storage with input tensors

For example, the following two custom operators are not valid custom operators:

@​torch.library.custom_op("mylib::foo", mutates_args=())
def foo(x: torch.Tensor) -> torch.Tensor:

### the result of `foo` must not directly be an input to foo.
    return x

@​torch.library.custom_op("mylib::bar", mutates_args=())
def bar(x: torch.Tensor) -> torch.Tensor:

### the result of bar must not be a view of an input of bar
    return x.view(-1)

The easiest workaround is to add an extra .clone() to the outputs:

@​torch.library.custom_op("mylib::foo", mutates_args=())
def foo(x: torch.Tensor) -> torch.Tensor:
    return x.clone()

@​torch.library.custom_op("mylib::bar", mutates_args=())
def bar(x: torch.Tensor) -> torch.Tensor:
    return x.view(-1).clone()

A common way to get into this situation is for a user to want to create a custom operator that sometimes mutates the input in-place and sometimes returns a new Tensor, like in the following example.

@​torch.library.custom_op("mylib::baz", mutates_args=["x"])
def baz(x: torch.Tensor) -> torch.Tensor:
    if inplace:
        x.sin_()
        return x
    else:
        return x.sin()

This dynamism is not supported and leads to undefined behavior. The workaround is to split the custom operator into two custom operators, one that always mutates the input in-place, and another that always returns a new Tensor.

@​torch.library.custom_op("mylib::baz_outplace", mutates_args=())
def baz_outplace(x: torch.Tensor) -> torch.Tensor:
    return x.sin()

@​torch.library.custom_op("mylib::baz_inplace", mutates_args=["x"])
def baz_inplace(x: torch.Tensor) -> torch.Tensor:
    x.sin_()

def baz(x):
    if inplace:
        baz_inplace(x)
        return x
    else:
        return baz_outplace(x)

Build metal kernels of MacOS-14+ and remove all pre-MacOS-14 specific logic, requires MacOS-14+ going forward (#​159733, #​159912)

PyTorch MPS is only supported on MacOS-14 or later. If you need to use MPS on MacOS Ventura, please avoid updating to Python-3.9 or above

Upgrade to DLPack 1.0 (#​145000)

This upgrade is doing the same BC-breaking changes as the DLPack release. Objects in torch.utils.dlpack have been updated to reflect these changes, such as DLDeviceType.

See the PR for details on the exact changes and how to update your code.

Raise appropriate errors in torch.cat (#​158249)

torch.cat now raises ValueError, IndexError or TypeError where appropriate instead of the generic RuntimeError. If you code was catching these errors, you can update to catch the new error type.

Default to dynamo=True for ONNX exporter (#​159646, #​162726)

Previously torch.onnx.export(...) used the legacy TorchScript exporter if no arguments were provied. The ONNX exporter now uses the newer torch.export.export pipeline by default (dynamo=True). This change improves graph fidelity and future-proofs exports, but may surface graph capture errors that were previously masked or handled differently.

Previously in torch 2.8.0:

### API calls the legacy exporter with dynamo=False
torch.onnx.export(...)

Now in torch 2.9.0:

### To preserve the original behavior
torch.onnx.export(..., dynamo=False)

### Export onnx model through torch.export.export
torch.onnx.export(...)

Recommendation: first try the new default; only fall back if you hit blocking issues and report them upstream.
Long term solution: fix the root cause instead of relying on fallback or TorchScript exporter.

Switch off runtime asserts by default in Export in favor of a shape guards function (#​160111, #​161178, #​161794)

To enable runtime asserts, use export(..., prefer_deferred_runtime_asserts_over_guards=True). Also kills the allow_complex_guards_as_runtime_asserts flag, merging it into the former option.

Additionally, exported_program.module() will generate a call to a _guards_fn submodule that will run additional checks on inputs. Users who do not want this behavior can either remove this call in the graph, or do exported_program.module(check_guards=False) to avoid the generation.

Set default opset to 20 in ONNX (#​158802)

Opset 20 enables newer operator definitions. If your tooling or downstream runtime only supports opset 18, pin it explicitly. For the latest ONNX operators, you can experiment with opset 23.

Previously in torch 2.8.0:

### opset_version=18
torch.onnx.export(...)

Now in torch 2.9.0:

### To preserve the original behavior
torch.onnx.export(..., opset_version=18)

### New: opset_version=20
torch.onnx.export(...)

### Use the latest supported opset: opset_version=23
torch.onnx.export(..., opset_version=23)

Drop draft_export in exporter API (#​161454, #​162225)

Remove implicit draft tracing from the default exporter path, achieving clearer behaviour and faster failures.
The expensive torch.export.draft_export diagnostic path is no longer auto-invoked (which could take hours on large models). You can still opt in for deep diagnostics:

Previously in torch 2.8.0:

### If both torch.export.export(..., strict=False) and
### torch.export.export(..., strict=True) fail to capture

### the model graph, torch.export.draft_export(...) will be triggered,
### and uses real tensor to trace/export the model.

#
### Inside export_to_onnx.py:

###  ... torch.onnx.export(..., dynamo=True)
python export_to_onnx.py

Now in torch 2.9.0:

### To trigger torch.export.draft_export once
### torch.export.export strict=False/True both

### fail:

TORCH_ONNX_ENABLE_DRAFT_EXPORT=True python export_to_onnx.py

Remove torch.onnx.dynamo_export and the onnxrt torch compile backend (#​158130, #​158258)

torch.onnx.dynamo_export is removed. Please use torch.onnx.export instead.
The experimental ONNX Runtime compile backend (torch.compile(backend="onnxrt")) is no longer supported.

Remove torch.onnx.enable_fake_mode (#​161222)

The dynamo=True mode uses FakeTensors by default which is memory efficient.

Some public facing ONNX utility APIs for the TorchScript based exporter are now private (#​161323)

Deprecated members in torch.onnx.verification are removed. Previously private torch.onnx.symbolic_opsets* functions will no longer be accessible. Consider making a copy of the source code if you need to access any private functions for compatibility with the TorchScript based exporter.

Remove torch.onnx.symbolic_caffe2 (#​157102)

Support for caffe2 in the ONNX exporter has ended and is removed.

Remove /d2implyavx512upperregs flag that slows build (#​159431)

Re-introduced AVX512 optimizations for Windows VS2022 builds, may cause issues with specific versions of VS2022, see #​145702

Add ScalarType to shim conversion and stable::Tensor.scalar_type (#​160557)

Before, user extensions could only in abstract pass around obfuscated dtypes appearing as int32_ts. Now, users can confidently use torch::headeronly::ScalarType in their extensions for major scalar types. This PR enables ABI stability by adding a translation layer through the shim, so that even if the ScalarType enum values change in the future, user extensions need not fear.

This change adds ScalarType support for user extensions and is only narrowly BC breaking for unpopular dtypes: quint*s, qint*s, Bits*, dummy_uint*s, dummy_int*s, Float8_e8m0fnu, and Float4_e2m1fn_x2 in the use case where an extension retrieves a Tensor dtype of the above and passes it into aoti_torch_call_dispatcher.

Deprecations

Deprecate pin_memory_device param in torch.utils.data.DataLoader (#​158323)

We move enabling pin_memory back inside BaseDataLoaderIter. This is required for StatefulDataloader which leveraged BaseDataLoaderIter direclty rather than the Dataloader class init

Deprecate torch.export.export_for_training API in favor of equivalent torch.export.export API (#​158203)

torch.export.export_for_training exists because we couldn't migrate internal usages of export to the final IR. Now that we have completed the migration, we deprecated and deleted this API.

New Features

Python Frontend

  • Add utility to get the kernel currently registered on the dispatcher (#​158393)
  • Extend __torch_function__ handler to be triggered by elements within a list (#​160256)
  • Add torch.hash_tensor reduction function (#​154149)

FX

  • Extend torch function support to ALL arguments instead of just scalar type (but not inside of list, #​145089)
  • Add is_fx_symbolic_tracing flag (#​161385)

Dynamo

  • Experimental API for ahead-of-time compiling models in fullgraph mode (#​161383)
  • Add a hook for recompilations (#​157961)
  • DynamicInts prototype (#​162194)

Introduces an API for annotating dynamic integer inputs & attributes for torch.compile, by wrapping plain ints with DynamicInt().
DynamicInt objects also work in eager mode, acting as their underlying values when passed as scalar inputs.

a = DynamicInt(4)
y = a + 2  # DynamicInt(6)
z = torch.ones(a)  # torch.ones(4)

fn = torch.compile(torch.ones)
fn(a)  # compiled fn takes a dynamic integer input
fn(2)  # returns torch.ones(2) without recompiling

Optimizer

  • Introduce Muon optimizer to PyTorch (#​160213)

Profiler

  • Add GC Events to Python Stack Tracer (#​161209)
  • Add a custom profiler configuration option (#​151656)

Inductor

  • Allow user to pass in custom partitioner function (#​157580)

Export

  • Add support for param mutation under inference mode (#​159661)

AOTDispatcher

  • Add AOTDispatcher config to set backward autocast behavior (#​156356)

Quantization

ONNX

C++ Extensions

Build Frontend

  • Add transpose to torch/csrc/stable (#​158160)
  • Add zero_() and empty_like(t) to torch/csrc/stable/ops.h (#​158866)

Release Engineering

CUDA

  • Add getter for CUDA graph exec to allow mutation of captured kernel params (#​161294)
  • Implement support for cudnn_batch_norm_out kernel to replace the autogen approach (#​123020)

CPU

MPS

ROCm

  • OCP Micro-scaling Format (mx-fp8/mx-fp4) Support (#​151360)

XPU

Improvements

Python Frontend

  • Speed up torch.load under FakeTensorMode by reducing random reads (#​157931)
  • Make torch.utils.benchmark.utils.timer accelerator agnostic (#​157131)
  • Improve error message for weight-only load errors (#​159935)

torch.nn

  • Allow register_buffer with Tensor-like objects (#​159455)
  • Improve error message for unsupported padding configurations (#​160866)
  • Validate target is 0D when input is 1D in NLLLoss (#​161412)

Optimizer

  • Resolve warning in LBFGS when converting a tensor with requires_grad=True to a scalar (#​160389)
  • Resolve SequentialLR deprecation warning about invoking step(epoch) (#​149392)

Autograd

  • Support deterministic torch.nn.Upsample mode="trilinear" backward (#​154239)

Distributed

c10d
  • Add improvements to eager init of ProcessGroupNCCL (#​156748)
  • Simplify unique hash management of ProcessGroupNCCL (#​156790)
  • Support per operation timeouts in ProcessGroupGloo (#​158128)
  • Allow ping to be retried in TCPStore (#​159165)
  • Support scalar tensor for functional all_gather (#​149913)
  • Expos unsafe_get_ptr for dist.ProcessGroupNCCL.NCCLConfig (#​161136)
  • Add batch option for send/recv_object_list (#​160342)
  • Make FakeStore optional to be passed into fake backend (#​162164)
  • Enable complex datatype support in ProcessGroupGloo (#​156633)
  • Move thread-local capture mode guard to include work.isStarted (#​160398)
DistributedDataParallel (DDP)
DTensor
  • Relax device_mesh argument constraint in local_map (#​157049)
  • Support complex numbers in DTensor redistribute (#​157329)
  • Rework partial propagation in point-wise op and support mul (#​157340)
  • Allow dynamic shapes for DTensor slice (#​157953)
  • Implement histc op (#​158298)
  • Made dispatch to sharding prop over decomps (#​159324)
  • Support user-supplied Generator for random ops (#​159933)
  • Add propagate_tensor_meta function that skips cache if _are_we_tracing (#​161334)
  • Support local_map as a decorator (#​161353)
Device Mesh
  • Enable the use of user set backend and pg option even for the global mesh (#​157501)
  • Enable slicing a submesh with warnings (#​158899)
  • Allow controlling PG backend and options via init_device_mesh (#​159371)
FullyShardedDataParallel2 (FSDP2)
  • Support custom all_gather and reduce_scatter comms (#​155189)
  • Made it fail set_allocate_memory_from_process_group if used together with custom comm hooks (#​157487)
  • Use reduceOpSum when world size is 1 (#​157529)
  • Skipp allgather when world size is 1 (#​160135)
  • Use post_reduce_stream.record_event() on hsdp+cpuoffload (#​160481)
Tensor Parallel (TP)
  • Improve parallelize_module API to support more cases (#​157182)
TensorPipe
  • Update TensorPipe pinned dependency version (#​159834)
TorchElastic
Pipeline Parallelism (PP)
  • Add eval() API to schedule (#​157795)
  • Allow intermediate nodes in zero bubble to have multiple grads (#​159084)
  • Support OVERLAP_F_B computation type (#​158978)
  • Initializ P2P communicators on first step (#​160210)
  • Add DualPipeV schedule (#​159591)

Linear Algebra Frontend

  • Use rocSOLVER for Cholesky inversion on AMD. (#​157154)
  • Add option for using TF32 as fp32 internal precision for matmul/linear/conv on MKLDNN (#​157520)
  • Make einsum produce contiguous outputs in more cases (#​161755)

Profiler

  • Add more CUDA API for kernel launcher (#​156016)
  • Allow Custom Time Unit When Printing Profiler Table (#​157913)
  • Update CUDA runtime kernel identification logic (#​157890)

FX

  • Fix DCE eliminating random operations by improving is_impure() (#​151524, #​157981)
  • Support converting a float32 tensor to a scalar in FX trace. (#​158216)
  • Correctly copy self.module_stack in ModuleStackTracer (#​159956)
  • Add tool to track events in graph split (#​159795)
  • Add node_name_match to subgraph rewriter (#​157574)

Dynamo

Inductor

  • Add Inductor support for MTIA backend (#​159211)
  • Share default device context when all graph partitions and cudagraph-unsafe ops are on the same device(#​162873)

Ahead-Of-Time Inductor (AOTI)

  • Enable AOTI for CPU on Windows (#​158915)
  • Re-enable TMA templates w/ AOTI (#​157819)
  • Don't allow int32 indices if {non-inf, > int32_max} upper bound is provided (#​159433)
  • Add RecordFunction to C shim so that profiling works with AOTI (#​159842)
  • Add AOTI C shim functions for collective ops (#​154492)
  • Add missing ops to set of C-shim ops which can have nullptr returns (#​158073)

Export

  • Handle None & ellipsis slicing/select in non-strict (#​157821)
  • Extend FP8 types in serialization (#​158430)
  • Improve error messages for deserialization (#​159881)
  • Support serialization for triton_kernel_wrapper_functional HOP (#​161314)
  • Support serialization for complex constants (#​161517)
  • Add runtime asserts to while_loop HOP subgraphs (#​158467)
  • Warn on side-effectful code in strict mode (#​160060)
  • Support for vmap in pre-dispatch export (#​154650)
  • Support vmap and custom autograd function/improve DTensor constructor inefficiency (#​162240)

AOTDispatcher

  • Skip logging in fp8 activation quantization if there are no nodes to be quantized (#​158129)
  • Add aot_export_joint_with_descriptors and aot_compile_joint_with_descriptors (#​158715)
  • Extract out prepare_aot_module_simplified for use in next PR (#​158319)
  • Rename modules in AOTAutograd (#​158449)
  • Track descriptors for all inputs/outputs of AOTAutograd traced graph (#​158624)
  • Improve graph output alias with subclass error message (#​159619)
  • Pass fw/bw compilers to aot_export_joint_with_descriptors (#​159814)

Composability

Quantization

  • Avoid getting model device once per node for pt2e quantization flow (#​159901)
  • Fixes bug in implementation of HistogramObserver (#​156457)
  • Support bias=None for fbgemm_linear_fp16_weight CPU op (#​158535)
  • Add Static Dispatch Kernel for wrapped_fbgemm_linear_fp16_weight for Sigmoid (#​160451)

Nested Tensor (NJT)

  • Added initial log_softmax() support (#​159662)

Foreach

  • Invoke vector.reserve() consistently for non-inplace foreach operations (#​161128)
  • Faster and safer lambda expression capture in has_integral_tensor() (#​161042)

ONNX

  • Support symbolic arguments in ONNX exporter (#​157734)
  • Fix torch.tensor warning in ONNX symbolic_opset10 export (#​158835)

C++ Frontend

  • Generalized AllocatorConfig to be device-agnostic via new AcceleratorAllocatorConfig (#​149601, #​150312)
  • Added Scalar::isUnsigned() method (#​159877)
  • Exposed ModelRunner from nativert as public (#​159989)
  • Improve error message for torch.binomial enforcing float inputs (#​157658)

Build Frontend

Release Engineering

CUDA

  • Make cublaslt/hipblaslt workspaces persistent (#​156495)
  • Remove unnecessary warnings during the ATen compilation process (#​157703)
  • Slightly improve error message from repeat_interleave kernel (#​157996)
  • Add framework for explanations for common CUDA errors (#​158395)
  • Upgrade KernelLauncher kernelLaunchCheck to print help string (#​158896)
  • Prep for cutlass upgrade by ignoring Wunused-but-set-variable (#​159276)
  • Workaround ATen SFINAE under libc++ (#​161101)
  • Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen (#​153373)
  • Add maybe unused flag to remove warning (#​157655)
  • Use new CCCL API in v2.8 (#​160554)
  • Improve cupy device placement when device is provided with explicit index (#​158529)

CPU (AArch64)

  • Made PyTorch compilable with gcc-14 on ARM (#​157867)

MPS

ROCm

  • Additional hipify mappings (#​158056, #​158352, #​161992)
  • Refactor composable_kernel (CK) backend user interface to improve user experience (#​152951)
  • Allow use of rocSOLVER for Cholesky inversion. (#​157154)
  • AOT Inductor enable gfx950 for max autotune using CK (#​159195)
  • Add flag torch.backends.miopen.immediate to toggle MIOpen Immediate Mode instead of relying on deterministic=True and benchmark=False (#​158951)
  • MIOpen convolutions no longer call reshape_ or unexpectedly change memory formats (#​161687)

XPU

  • Support Intel GPU quantization ops in AOTInductor (#​156572)
  • Add device_id to Intel GPU properties to distinguish iGPUs with identical names (#​156481)

Bug Fixes

Python Frontend

  • Add option in torch.utils.cpp_extension.load_inline to override gencode (#​156850)
  • Fix max_width computation in Tensor printing (#​126859)
  • Improve pin_memory error message on CPU-only systems (#​159994)
  • Making batching rule for F.embedding DTensor-aware (#​162117)

Autograd

  • Fix torch.autograd.Function memory leak due to torch.utils.checkpiont early stopping (#​161171)
  • Fix torch.autograd.graph.GradientEdge for torch.autograd.Function (#​160098)
  • Match 0-dim gradients device type regardless of subclass-ness (#​160165)

Distributed

c10d
  • Fix slow init due to repeated dns resolution failure in socket (#​159596)
  • Fix setGroupName and setGroupDesc in group_split and merge_remote_group (#​159429)
  • Fix a bug of distributed 'gather' with noncontiguous tensors on the Gloo backend (#​158903)
  • Fix a bug of distributed 'gather' with noncontiguous tensors on the NCCL backend (#​159549)
  • Fix data inconsistencies when using batch_isend_irecv with 2D tensor views by making P2P tensors dense (#​163719)
  • Handle discontiguous allgather/reducescatter inputs (#​163712)
Device Mesh
  • Fix the not incorrectly chained each of the strings as iterables (#​160709)
DistributedDataParallel (DDP)
  • Fix incorrect interaction between DDPOptimizer and donated buffers (#​160745)
DTensor
  • Fix DTensor handling of conjugate bit (#​158030)
  • Fix OpSchema equality check (#​161231)
  • Fix grouped_mm strategy for invalid stride cases (#​158245)
  • Fix F.one_hot in DTensor (#​162307)
  • Always disabled ShardingPropagation cache if compiling (#​156868)
FullyShardedDataParallel (FSDP)
  • Fix the bug in FSDP offload pin_memory (#​157147)
  • Fix to ensure writeback handles NO_SHARD correctly by flattening tensors before copying (#​154369)
FullyShardedDataParallel2 (FSDP2)
  • Fix error message for fsdp_pre_all_gather (#​160817)
  • Fix the issue with set_reduce_scatter_divide_factor errors and MixedPrecisionPolicy (#​155964)
Pipeline Parallelism (PP)
TensorPipe
  • Fix import torch if compiled without TensorPipe (#​159461)
TorchElastic
  • Fix wrong log file name in the docs of torch.distributed.elastic.multiprocessing.start_processes() (#​160396)

Linear Algebra Frontend

  • Avoid downcasts for fp16 matmul on the BLAS backend (#​161999)

Profiler

  • Fix Linter for Global Annotations flag in Snapshot (#​157858)

FX

  • Fix split_module with symint (#​160093)
  • Fix getattr_recursive with ModuleList (#​161204)
  • Skip const folding with symbolic expression (#​161437)
  • Fix qualified name for methods of torch.Tensor (#​162224)

Dynamo

  • Fix segfault due to interaction between Dynamo backends and torch.compiler.reset() (#​156527)
  • Fix crash due to bad interaction with recompilations and with blocks in Python 3.11+ (#​162318)

torch.nn

  • Fix silent correctness w/ backpropping grads for FlexAttention (#​163677)
  • Fix return_lse warning message in FlexAttention (#​163578)
  • Fix FlexAttention head broadcast (#​163426)

Inductor

  • Fix wrong meta function for constant_pad_nd (#​159878)
  • Fix learnable bias assertion error in Inductor (#​161170)
  • Fix int64 from MutationOutput Buffer (#​162020)
  • Fix Inductor CUDA sort NaN behavior (#​159308)
  • Fix layout for local buf in outer loop fusion (#​160857)
  • Fix slice scatter dtype consistency (#​160851)
  • Fix 3d tiled online softmax (#​162341)
  • Fix unsafe collective reorder past wait in Inductor (#​157489)
  • Fix FallbackKernel alias function to avoid incorrect aliasing for custom ops (#​163227)

Ahead-Of-Time Inductor (AOTI)

  • Fix a bug from load_constants (#​161887)
  • Fix wrong propagation of fallback_ops_dict in gen_aoti_c_shim (#​159904)
  • Fix unbacked symint and memory leak in Inductor memory planning (#​159839)
  • Fix memory leak in AOTI when calling aoti_torch_as_strided (#​162118)
  • Explicitly delete wait_tensor returned tensor (#​159502)
  • Fix memory leak from all_reduce (#​159818)


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

To execute skipped test pipelines write comment /ok-to-test.


Documentation

Find out how to configure dependency updates in MintMaker documentation or see all available configuration options in Renovate documentation.

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
@konflux-internal-p02 konflux-internal-p02 bot force-pushed the konflux/mintmaker/rhoai-3.2/torch-2.x branch from 3d468b5 to 5826b14 Compare November 12, 2025 20:59
@konflux-internal-p02 konflux-internal-p02 bot changed the title Update dependency torch to v2.9.0 Update dependency torch to v2.9.1 Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants