Skip to content

memory planner to allocate element-wise output buffer in place of input (#19067)#19067

Open
rezaasjd wants to merge 1 commit intopytorch:mainfrom
rezaasjd:export-D100371295
Open

memory planner to allocate element-wise output buffer in place of input (#19067)#19067
rezaasjd wants to merge 1 commit intopytorch:mainfrom
rezaasjd:export-D100371295

Conversation

@rezaasjd
Copy link
Copy Markdown
Contributor

@rezaasjd rezaasjd commented Apr 23, 2026

Summary:

Adds a pass namely InPlaceElemWiseLikeOpsPass which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called memory.alloc_inplace.
In memory planning, the nodes with output spec type of alloc_inplace get output allocation in place of the same node's input.

Differential Revision: D100371295

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19067

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 4 Awaiting Approval

As of commit f1fa6fc with merge base d858cd9 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 23, 2026

@rezaasjd has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100371295.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Comment thread exir/capture/_config.py Outdated
# EdgeProgramManager or can be defined per program.
memory_planning_pass: Union[PassType, Dict[str, PassType]] = MemoryPlanningPass()
to_out_var_pass: PassType = ToOutVarPass(ignore_to_out_var_failure=False)
inplace_elem_wise_like_ops_pass: Optional[PassType] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just use a custom memory plan that does this rather then updating the config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be better. Fix is coming.

Comment thread exir/passes/__init__.py Outdated
"""

def __init__(
self, eligible_ops: Optional[Set[Callable[..., Any]]] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm its a bit strange the pass itself doesnt define this set. The main concern would be if this exposes/relies on implementation details on read before write but for elementwise I think it should be fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a default set, but left the option to be overridden by the user

@JacobSzwejbka
Copy link
Copy Markdown
Contributor

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @JacobSzwejbka's task in 2m 46s —— View job


Review of PR #19067: In-place element-wise memory allocation

  • Gather context and read changed files
  • Analyze the implementation
  • Post review feedback

Overview

This PR adds in-place buffer reuse for element-wise ops: an InPlaceElemWiseLikeOpsPass identifies eligible element-wise operations, replaces memory.alloc with memory.alloc_inplace, and the greedy memory planner assigns the output the same offset as the input. The design is clean and the approach is sound — placing the output in-place over a dead input for element-wise ops is a well-known optimization.

Issues

1. No tests

This is the most significant gap. There are zero tests for any of the new functionality:

  • No unit tests for InPlaceElemWiseLikeOpsPass
  • No tests for the greedy planner's deferred_inplace handling
  • No tests for the verifier's inplace_base exception
  • No tests for alloc_inplace in the emitter
  • No edge case tests (e.g., multi-user input, output > input size, chained in-place ops)

The existing exir/tests/test_memory_planning.py would be the natural place for these.

2. naive algorithm not updated (exir/memory_planning.py)

The greedy algorithm (line 849) has full support for deferred_inplace, but the naive algorithm has no awareness of inplace_base at all. If someone uses naive with this pass enabled, in-place specs will be allocated independently, defeating the optimization and potentially causing silent correctness issues if the verifier expects shared storage.

3. alloc_inplace return type annotation is incorrect (exir/memory.py:36)

def alloc_inplace(base: torch.Tensor, spec: AllocSpec) -> torch.Tensor:

When spec is a list, this returns List[torch.Tensor], not torch.Tensor. Compare with alloc which correctly returns pytree.PyTree. Should be -> pytree.PyTree or -> Union[torch.Tensor, List[torch.Tensor]].

4. User-check doesn't account for memory.free nodes (exir/passes/__init__.py:512)

input_users = [u for u in input_node.users if u != node]
if len(input_users) > 0:
    continue

This filters out only the current op node. If a memory.free call was already inserted for the input (or any other bookkeeping node), it would count as a user and block the optimization. Right now this works because InPlaceElemWiseLikeOpsPass runs before free-insertion, but this ordering dependency is implicit and fragile. Consider also filtering out memory.free nodes, or adding a comment documenting the ordering requirement.

5. Hardcoded args[0] as input tensor (exir/passes/__init__.py:491)

input_node = node.args[0]

This assumes the first positional argument is always the tensor to share memory with. While this holds for common element-wise ops (add.out, mul.out, relu.out), it's a silent assumption. Since eligible_ops is user-provided, a mis-configured op where args[0] isn't the right tensor would cause incorrect memory sharing with no error. Consider adding a validation or documenting this contract prominently.

6. inplace_base uses getattr instead of attribute access (exir/memory_planning.py:860)

if getattr(spec, "inplace_base", None) is not None:

Since inplace_base is now always initialized in init_mem_planning_fields (tensor.py:221), you can use spec.inplace_base is not None directly. Same for the verifier at line 191-193. Using getattr suggests the field might not exist, which is misleading now that it's always present.

Design Feedback

7. Agree with Jacob's comment on config placement (exir/capture/_config.py:64)

Adding inplace_elem_wise_like_ops_pass to ExecutorchBackendConfig feels heavyweight for what is essentially a memory planning strategy. As Jacob suggested, this could be a custom memory planning pass (or composed with one) rather than a new top-level config field. This would keep the config surface small and make the feature more discoverable as part of memory planning.

8. Eligible ops should have a default set or be defined inside the pass (exir/passes/__init__.py:465-468)

As Jacob noted, it's odd that the pass doesn't define its own default set of eligible ops. Currently eligible_ops defaults to an empty set, meaning the pass is a no-op unless the caller knows exactly which ops to pass. A reasonable default set of element-wise ops (e.g., aten.add.out, aten.mul.out, aten.relu.out, aten.sigmoid.out, etc.) would make this much more usable out of the box.

Minor Nits

  • exir/emit/_emitter.py:1758: The alloc_inplace branch doesn't assert argument count like the alloc branch does (assert len(args) == 1 on line 1755). Consider adding assert len(args) == 2 for consistency/safety.
  • exir/passes/__init__.py:34: The TensorSpec import is added but never used directly in this file — the pass works with node.meta["spec"] through MemoryPlanningPass, not directly with TensorSpec.

Summary

The core algorithm is correct and the approach is well-motivated. The main blockers are the lack of tests and the design questions Jacob raised about config placement and default eligible ops. The naive algorithm gap and type annotation issue should also be addressed before merge.

Copy link
Copy Markdown
Contributor

@JacobSzwejbka JacobSzwejbka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@meta-codesync meta-codesync Bot changed the title memory planner to allocate element-wise output buffer in place of input memory planner to allocate element-wise output buffer in place of input (#19067) Apr 30, 2026
@rezaasjd rezaasjd force-pushed the export-D100371295 branch from 30c6c9e to 3878611 Compare April 30, 2026 17:46
rezaasjd pushed a commit to rezaasjd/executorch that referenced this pull request Apr 30, 2026
…ut (pytorch#19067)

Summary:

Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called `memory.alloc_inplace`.
In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input.

Differential Revision: D100371295
rezaasjd pushed a commit to rezaasjd/executorch that referenced this pull request Apr 30, 2026
…ut (pytorch#19067)

Summary:

Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called `memory.alloc_inplace`.
In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input.

Differential Revision: D100371295
@rezaasjd rezaasjd force-pushed the export-D100371295 branch from 3878611 to 7ffba86 Compare April 30, 2026 23:11
rezaasjd pushed a commit to rezaasjd/executorch that referenced this pull request Apr 30, 2026
…ut (pytorch#19067)

Summary:

Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called `memory.alloc_inplace`.
In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input.

Differential Revision: D100371295
@rezaasjd rezaasjd force-pushed the export-D100371295 branch 2 times, most recently from e9f3546 to 7eac5d4 Compare May 1, 2026 23:54
rezaasjd pushed a commit to rezaasjd/executorch that referenced this pull request May 1, 2026
…ut (pytorch#19067)

Summary:

Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called `memory.alloc_inplace`.
In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input.

Differential Revision: D100371295
@rezaasjd
Copy link
Copy Markdown
Contributor Author

rezaasjd commented May 1, 2026

@JacobSzwejbka I have a new rev now.

@rezaasjd rezaasjd requested a review from JacobSzwejbka May 4, 2026 21:00
…ut (pytorch#19067)

Summary:

Adds a pass namely `InPlaceElemWiseLikeOpsPass` which checks for possible elem-wise ops in the graph w/o skip conection from input.
The pass then annotate the output as a new alloc type called `memory.alloc_inplace`.
In memory planning, the nodes with output spec type of `alloc_inplace` get output allocation in place of the same node's input.

Differential Revision: D100371295
@rezaasjd rezaasjd force-pushed the export-D100371295 branch from 7eac5d4 to f1fa6fc Compare May 7, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants