memory planner to allocate element-wise output buffer in place of input (#19067) by rezaasjd · Pull Request #19067 · pytorch/executorch

rezaasjd · 2026-04-23T14:02:40Z

Summary:

Adds a pass namely InPlaceElemWiseLikeOpsPass which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called memory.alloc_inplace.
In memory planning, the nodes with output spec type of alloc_inplace get output allocation in place of the same node's input.

Differential Revision: D100371295

pytorch-bot · 2026-04-23T14:02:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19067

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 4 Awaiting Approval

As of commit f1fa6fc with merge base d858cd9 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-23T14:03:03Z

@rezaasjd has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100371295.

github-actions · 2026-04-23T14:03:22Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JacobSzwejbka · 2026-04-28T15:06:18Z

    # EdgeProgramManager or can be defined per program.
    memory_planning_pass: Union[PassType, Dict[str, PassType]] = MemoryPlanningPass()
    to_out_var_pass: PassType = ToOutVarPass(ignore_to_out_var_failure=False)
+    inplace_elem_wise_like_ops_pass: Optional[PassType] = None


Can you just use a custom memory plan that does this rather then updating the config?

Yes that would be better. Fix is coming.

JacobSzwejbka · 2026-04-28T15:11:43Z

+    """
+
+    def __init__(
+        self, eligible_ops: Optional[Set[Callable[..., Any]]] = None


Hmm its a bit strange the pass itself doesnt define this set. The main concern would be if this exposes/relies on implementation details on read before write but for elementwise I think it should be fine.

added a default set, but left the option to be overridden by the user

JacobSzwejbka · 2026-04-28T15:12:13Z

@claude review this PR

claude · 2026-04-28T15:13:04Z

Claude finished @JacobSzwejbka's task in 2m 46s —— View job

Review of PR #19067: In-place element-wise memory allocation

Gather context and read changed files
Analyze the implementation
Post review feedback

Overview

This PR adds in-place buffer reuse for element-wise ops: an InPlaceElemWiseLikeOpsPass identifies eligible element-wise operations, replaces memory.alloc with memory.alloc_inplace, and the greedy memory planner assigns the output the same offset as the input. The design is clean and the approach is sound — placing the output in-place over a dead input for element-wise ops is a well-known optimization.

Issues

1. No tests

This is the most significant gap. There are zero tests for any of the new functionality:

No unit tests for InPlaceElemWiseLikeOpsPass
No tests for the greedy planner's deferred_inplace handling
No tests for the verifier's inplace_base exception
No tests for alloc_inplace in the emitter
No edge case tests (e.g., multi-user input, output > input size, chained in-place ops)

The existing exir/tests/test_memory_planning.py would be the natural place for these.

2. `naive` algorithm not updated (`exir/memory_planning.py`)

The greedy algorithm (line 849) has full support for deferred_inplace, but the naive algorithm has no awareness of inplace_base at all. If someone uses naive with this pass enabled, in-place specs will be allocated independently, defeating the optimization and potentially causing silent correctness issues if the verifier expects shared storage.

3. `alloc_inplace` return type annotation is incorrect (`exir/memory.py:36`)

def alloc_inplace(base: torch.Tensor, spec: AllocSpec) -> torch.Tensor:

When spec is a list, this returns List[torch.Tensor], not torch.Tensor. Compare with alloc which correctly returns pytree.PyTree. Should be -> pytree.PyTree or -> Union[torch.Tensor, List[torch.Tensor]].

4. User-check doesn't account for `memory.free` nodes (`exir/passes/init.py:512`)

input_users = [u for u in input_node.users if u != node]
if len(input_users) > 0:
    continue

This filters out only the current op node. If a memory.free call was already inserted for the input (or any other bookkeeping node), it would count as a user and block the optimization. Right now this works because InPlaceElemWiseLikeOpsPass runs before free-insertion, but this ordering dependency is implicit and fragile. Consider also filtering out memory.free nodes, or adding a comment documenting the ordering requirement.

5. Hardcoded `args[0]` as input tensor (`exir/passes/init.py:491`)

input_node = node.args[0]

This assumes the first positional argument is always the tensor to share memory with. While this holds for common element-wise ops (add.out, mul.out, relu.out), it's a silent assumption. Since eligible_ops is user-provided, a mis-configured op where args[0] isn't the right tensor would cause incorrect memory sharing with no error. Consider adding a validation or documenting this contract prominently.

6. `inplace_base` uses `getattr` instead of attribute access (`exir/memory_planning.py:860`)

if getattr(spec, "inplace_base", None) is not None:

Since inplace_base is now always initialized in init_mem_planning_fields (tensor.py:221), you can use spec.inplace_base is not None directly. Same for the verifier at line 191-193. Using getattr suggests the field might not exist, which is misleading now that it's always present.

Design Feedback

7. Agree with Jacob's comment on config placement (`exir/capture/_config.py:64`)

Adding inplace_elem_wise_like_ops_pass to ExecutorchBackendConfig feels heavyweight for what is essentially a memory planning strategy. As Jacob suggested, this could be a custom memory planning pass (or composed with one) rather than a new top-level config field. This would keep the config surface small and make the feature more discoverable as part of memory planning.

8. Eligible ops should have a default set or be defined inside the pass (`exir/passes/init.py:465-468`)

As Jacob noted, it's odd that the pass doesn't define its own default set of eligible ops. Currently eligible_ops defaults to an empty set, meaning the pass is a no-op unless the caller knows exactly which ops to pass. A reasonable default set of element-wise ops (e.g., aten.add.out, aten.mul.out, aten.relu.out, aten.sigmoid.out, etc.) would make this much more usable out of the box.

Minor Nits

exir/emit/_emitter.py:1758: The alloc_inplace branch doesn't assert argument count like the alloc branch does (assert len(args) == 1 on line 1755). Consider adding assert len(args) == 2 for consistency/safety.
exir/passes/__init__.py:34: The TensorSpec import is added but never used directly in this file — the pass works with node.meta["spec"] through MemoryPlanningPass, not directly with TensorSpec.

Summary

The core algorithm is correct and the approach is well-motivated. The main blockers are the lack of tests and the design questions Jacob raised about config placement and default eligible ops. The naive algorithm gap and type annotation issue should also be addressed before merge.

JacobSzwejbka

.

…ut (pytorch#19067) Summary: Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input. The pass then annotate the output as a new alloc type called `memory.alloc_inplace`. In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input. Differential Revision: D100371295

rezaasjd · 2026-05-01T23:58:12Z

@JacobSzwejbka I have a new rev now.

…ut (pytorch#19067) Summary: Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input. The pass then annotate the output as a new alloc type called `memory.alloc_inplace`. In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input. Differential Revision: D100371295

rezaasjd requested review from JacobSzwejbka and larryliu0820 as code owners April 23, 2026 14:02

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 23, 2026

JacobSzwejbka reviewed Apr 28, 2026

View reviewed changes

JacobSzwejbka requested changes Apr 28, 2026

View reviewed changes

meta-codesync Bot changed the title ~~memory planner to allocate element-wise output buffer in place of input~~ memory planner to allocate element-wise output buffer in place of input (#19067) Apr 30, 2026

rezaasjd force-pushed the export-D100371295 branch from 30c6c9e to 3878611 Compare April 30, 2026 17:46

rezaasjd force-pushed the export-D100371295 branch from 3878611 to 7ffba86 Compare April 30, 2026 23:11

rezaasjd force-pushed the export-D100371295 branch 2 times, most recently from e9f3546 to 7eac5d4 Compare May 1, 2026 23:54

rezaasjd requested a review from JacobSzwejbka May 4, 2026 21:00

rezaasjd force-pushed the export-D100371295 branch from 7eac5d4 to f1fa6fc Compare May 7, 2026 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory planner to allocate element-wise output buffer in place of input (#19067)#19067

memory planner to allocate element-wise output buffer in place of input (#19067)#19067
rezaasjd wants to merge 1 commit intopytorch:mainfrom
rezaasjd:export-D100371295

rezaasjd commented Apr 23, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

JacobSzwejbka Apr 28, 2026

Uh oh!

rezaasjd May 1, 2026

Uh oh!

JacobSzwejbka Apr 28, 2026

Uh oh!

rezaasjd May 1, 2026

Uh oh!

JacobSzwejbka commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

JacobSzwejbka left a comment

Uh oh!

rezaasjd commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rezaasjd commented Apr 23, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19067

⚠️ 4 Awaiting Approval

Uh oh!

meta-codesync Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

This PR needs a release notes: label

Uh oh!

JacobSzwejbka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

rezaasjd May 1, 2026

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

rezaasjd May 1, 2026

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #19067: In-place element-wise memory allocation

Overview

Issues

1. No tests

2. naive algorithm not updated (exir/memory_planning.py)

3. alloc_inplace return type annotation is incorrect (exir/memory.py:36)

4. User-check doesn't account for memory.free nodes (exir/passes/__init__.py:512)

5. Hardcoded args[0] as input tensor (exir/passes/__init__.py:491)

6. inplace_base uses getattr instead of attribute access (exir/memory_planning.py:860)

Design Feedback

7. Agree with Jacob's comment on config placement (exir/capture/_config.py:64)

8. Eligible ops should have a default set or be defined inside the pass (exir/passes/__init__.py:465-468)

Minor Nits

Summary

Uh oh!

JacobSzwejbka left a comment

Choose a reason for hiding this comment

Uh oh!

rezaasjd commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rezaasjd commented Apr 23, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Apr 28, 2026 •

edited

Loading

2. `naive` algorithm not updated (`exir/memory_planning.py`)

3. `alloc_inplace` return type annotation is incorrect (`exir/memory.py:36`)

4. User-check doesn't account for `memory.free` nodes (`exir/passes/init.py:512`)

5. Hardcoded `args[0]` as input tensor (`exir/passes/init.py:491`)

6. `inplace_base` uses `getattr` instead of attribute access (`exir/memory_planning.py:860`)

7. Agree with Jacob's comment on config placement (`exir/capture/_config.py:64`)

8. Eligible ops should have a default set or be defined inside the pass (`exir/passes/init.py:465-468`)