Skip to content

PLUGINS/UCX: Do not allow emulated RMA protocols#1610

Open
tvegas1 wants to merge 1 commit intoai-dynamo:mainfrom
tvegas1:proto_emulation_disabled
Open

PLUGINS/UCX: Do not allow emulated RMA protocols#1610
tvegas1 wants to merge 1 commit intoai-dynamo:mainfrom
tvegas1:proto_emulation_disabled

Conversation

@tvegas1
Copy link
Copy Markdown
Contributor

@tvegas1 tvegas1 commented May 4, 2026

What?

Disable emulated RMA protocols in UCX plugin. An explicit error will be triggered if some non-zcopy protocols would have to be used.

Why?

We want obvious failure when zcopy protocols are not usable.

How?

Leverage added option start in UCX 1.21.

Summary by CodeRabbit

  • Chores
    • Adjusted UCX communication settings for newer UCX versions: protocol emulation disabled and inline response behavior refined, along with channel tuning — improving stability and network performance for distributed workloads.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

👋 Hi tvegas1! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

Comment thread src/plugins/ucx/ucx_utils.cpp Outdated
@tvegas1 tvegas1 force-pushed the proto_emulation_disabled branch from 8072d05 to 27d94bc Compare May 4, 2026 14:32
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

During nixlUcxContext construction, the UCX configuration now sets IB_TX_INLINE_RESP = "0" and PROTO_EMULATION_ENABLE = "n" inside the if (ucpVersion_ >= UCP_VERSION(1, 21)) block alongside the existing RC_GDA_NUM_CHANNELS modification.

Changes

UCX Context Configuration

Layer / File(s) Summary
Device Restriction (pre-existing)
src/plugins/ucx/ucx_utils.cpp
Existing logic applies NET_DEVICES when devs is provided.
Core Configuration
src/plugins/ucx/ucx_utils.cpp
Adds config.modify("IB_TX_INLINE_RESP", "0"); and config.modify("PROTO_EMULATION_ENABLE", "n"); inside nixlUcxContext constructor for ucpVersion_ >= UCP_VERSION(1, 21), alongside existing RC_GDA_NUM_CHANNELS modification.
Wiring / Initialization Order
src/plugins/ucx/ucx_utils.cpp
New settings are applied during context initialization within the same UCX-version conditional block.
Tests / Documentation
(none changed)
No test or documentation files modified in this diff.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A tiny tweak, two lines well-placed,
Quiet the echoes, speed embraced.
Inline replies trimmed to none,
Emulation sleeps, work gets done. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: disabling emulated RMA protocols in the UCX plugin, which aligns with the code modification that sets PROTO_EMULATION_ENABLE to disable emulated protocols.
Description check ✅ Passed The description follows the template with all required sections (What, Why, How) properly completed. It clearly explains the change, justification, and implementation approach.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/plugins/ucx/ucx_utils.cpp (1)

421-453: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforcement gap: emulation can still be re-enabled despite disable intent

Line 421 uses config.modify(...), which is skipped when UCX_PROTO_EMULATION_ENABLE is set, and the later engine_config loop (Lines 447-453) can reapply PROTO_EMULATION_ENABLE. That means emulated RMA is still user-overridable, which conflicts with “do not allow emulated RMA protocols.”

Consider enforcing this as non-overridable (e.g., reject/ignore PROTO_EMULATION_ENABLE in engine_config and set it with modifyAlways after parsing overrides).

Suggested hard-enforcement pattern
-    config.modify("PROTO_EMULATION_ENABLE", "n");
+    // Default now; hard-enforced again after engine_config parsing.
+    config.modifyAlways("PROTO_EMULATION_ENABLE", "n");
@@
     while (std::getline(stream, elem, ',')) {
         std::string_view elem_view = elem;
         size_t pos = elem_view.find('=');

         if (pos != std::string::npos) {
-            config.modify(elem_view.substr(0, pos), elem_view.substr(pos + 1));
+            const auto key = elem_view.substr(0, pos);
+            if (key == "PROTO_EMULATION_ENABLE") {
+                NIXL_WARN << "Ignoring engine_config override for PROTO_EMULATION_ENABLE";
+                continue;
+            }
+            config.modify(key, elem_view.substr(pos + 1));
         }
     }
+
+    // Keep emulated RMA disabled regardless of external overrides.
+    config.modifyAlways("PROTO_EMULATION_ENABLE", "n");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugins/ucx/ucx_utils.cpp` around lines 421 - 453, The current code sets
PROTO_EMULATION_ENABLE via config.modify but later parses engine_config and can
re-enable it; ensure PROTO_EMULATION_ENABLE cannot be overridden by: when
parsing engine_config in the engine_config loop (the std::getline loop that
produces elem/elem_view and calls config.modify), detect keys equal to
"PROTO_EMULATION_ENABLE" (or equivalent constant) and ignore or reject them, and
after the loop call config.modifyAlways("PROTO_EMULATION_ENABLE", "n") (or the
config API's always-write method) to hard-enforce the disabled state; reference
the config.modify calls, the engine_config parsing loop, and use modifyAlways
(or equivalent) to set the final non-overridable value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/plugins/ucx/ucx_utils.cpp`:
- Around line 421-453: The current code sets PROTO_EMULATION_ENABLE via
config.modify but later parses engine_config and can re-enable it; ensure
PROTO_EMULATION_ENABLE cannot be overridden by: when parsing engine_config in
the engine_config loop (the std::getline loop that produces elem/elem_view and
calls config.modify), detect keys equal to "PROTO_EMULATION_ENABLE" (or
equivalent constant) and ignore or reject them, and after the loop call
config.modifyAlways("PROTO_EMULATION_ENABLE", "n") (or the config API's
always-write method) to hard-enforce the disabled state; reference the
config.modify calls, the engine_config parsing loop, and use modifyAlways (or
equivalent) to set the final non-overridable value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: debaef6b-60f6-42b6-8e20-760296d08c78

📥 Commits

Reviewing files that changed from the base of the PR and between 78fb297 and 8072d05.

📒 Files selected for processing (1)
  • src/plugins/ucx/ucx_utils.cpp

@tvegas1 tvegas1 force-pushed the proto_emulation_disabled branch from 27d94bc to 1fbe3e9 Compare May 4, 2026 14:36
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/plugins/ucx/ucx_utils.cpp (1)

421-421: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify if PROTO_EMULATION_ENABLE requires a UCX version guard (similar to RC_GDA_NUM_CHANNELS).

A previous reviewer flagged that this should potentially be placed inside the if (ucpVersion_ >= UCP_VERSION(1, 21)) block at Line 427. The current code has no version guard. If PROTO_EMULATION_ENABLE is only recognized by UCX 1.21+, then ucp_config_modify() will silently ignore it on older versions, meaning emulation would remain enabled without any indication — directly contradicting the PR's stated intent of never allowing emulated protocols.

Confirm in the UCX source/changelog which version first introduced PROTO_EMULATION_ENABLE:

#!/bin/bash
# Check for PROTO_EMULATION_ENABLE in UCX-related files in the repo
rg -n "PROTO_EMULATION" --type cpp
rg -n "PROTO_EMULATION" --type cmake
rg -n "PROTO_EMULATION" --type txt
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugins/ucx/ucx_utils.cpp` at line 421, The code unconditionally sets
PROTO_EMULATION_ENABLE via config.modify which may be unknown to older UCX
versions and therefore ignored; wrap the config.modify("PROTO_EMULATION_ENABLE",
"n") call in the same version guard used for RC_GDA_NUM_CHANNELS so it only runs
when ucpVersion_ >= UCP_VERSION(1,21) (i.e., inside the if (ucpVersion_ >=
UCP_VERSION(1, 21)) block), and ensure any call path that applies config.modify
uses the same ucpVersion_ check before calling ucp_config_modify to guarantee
older UCX won’t silently ignore the flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/plugins/ucx/ucx_utils.cpp`:
- Line 421: The code unconditionally sets PROTO_EMULATION_ENABLE via
config.modify which may be unknown to older UCX versions and therefore ignored;
wrap the config.modify("PROTO_EMULATION_ENABLE", "n") call in the same version
guard used for RC_GDA_NUM_CHANNELS so it only runs when ucpVersion_ >=
UCP_VERSION(1,21) (i.e., inside the if (ucpVersion_ >= UCP_VERSION(1, 21))
block), and ensure any call path that applies config.modify uses the same
ucpVersion_ check before calling ucp_config_modify to guarantee older UCX won’t
silently ignore the flag.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9786496c-7212-4517-bbda-4306addfc8b3

📥 Commits

Reviewing files that changed from the base of the PR and between 8072d05 and 27d94bc.

📒 Files selected for processing (1)
  • src/plugins/ucx/ucx_utils.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugins/ucx/ucx_utils.cpp`:
- Line 427: The current call config.modify("PROTO_EMULATION_ENABLE", "n") is a
soft enforcement and can be overridden by the UCX_PROTO_EMULATION_ENABLE
environment variable; update the code in ucx_utils.cpp so the enforcement cannot
be bypassed by either (a) replace config.modify(...) with
config.modifyAlways("PROTO_EMULATION_ENABLE", "n") to force the setting, or (b)
before modifying, read getenv("UCX_PROTO_EMULATION_ENABLE") and if it exists and
is not "n" log an error and fail initialization (or explicitly clear it),
ensuring the UCX_PROTO_EMULATION_ENABLE env var cannot re-enable emulation;
reference the existing config object and the
PROTO_EMULATION_ENABLE/UCX_PROTO_EMULATION_ENABLE symbols when making the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e217ca1-2e6d-4a23-8aea-543862d53d29

📥 Commits

Reviewing files that changed from the base of the PR and between 27d94bc and 1fbe3e9.

📒 Files selected for processing (1)
  • src/plugins/ucx/ucx_utils.cpp

Comment thread src/plugins/ucx/ucx_utils.cpp
brminich
brminich previously approved these changes May 4, 2026
iyastreb
iyastreb previously approved these changes May 4, 2026
Signed-off-by: Thomas Vegas <tvegas@nvidia.com>
@tvegas1 tvegas1 dismissed stale reviews from iyastreb and brminich via 7b405bf May 4, 2026 18:38
@tvegas1 tvegas1 force-pushed the proto_emulation_disabled branch from 1fbe3e9 to 7b405bf Compare May 4, 2026 18:38
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/plugins/ucx/ucx_utils.cpp (1)

426-430: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

PROTO_EMULATION_ENABLE (and IB_TX_INLINE_RESP) use soft enforcement, defeating the PR's stated goal

Both new settings at lines 427–428 go through config.modify(), which (per config.cpp:28–31) silently no-ops if the corresponding UCX_* environment variable is already set. For PROTO_EMULATION_ENABLE this directly conflicts with the PR objective — "do not allow emulated RMA protocols" — because any pre-set UCX_PROTO_EMULATION_ENABLE=y environment variable will cause the call to be ignored without warning. The same argument applies to IB_TX_INLINE_RESP if setting it to "0" is required for correct zero-copy behaviour.

Additionally, the engine-config parsing loop (lines 449–455) calls config.modify() for each key=value pair, so passing PROTO_EMULATION_ENABLE=y in engine_config would re-enable emulation even without the environment variable.

NET_DEVICES (line 418) uses config.modifyAlways() precisely because it must not be silently bypassed; the same rationale applies here.

💡 Proposed fix
     if (ucpVersion_ >= UCP_VERSION(1, 21)) {
-        config.modify("IB_TX_INLINE_RESP", "0");
-        config.modify("PROTO_EMULATION_ENABLE", "n");
+        config.modifyAlways("IB_TX_INLINE_RESP", "0");
+        config.modifyAlways("PROTO_EMULATION_ENABLE", "n");

         config.modify("RC_GDA_NUM_CHANNELS", std::to_string(num_device_channels));
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/ucx/ucx_utils.cpp` around lines 426 - 430, The calls to
config.modify("IB_TX_INLINE_RESP", "0") and
config.modify("PROTO_EMULATION_ENABLE", "n") silently no-op when UCX_* env vars
or engine_config override is present, defeating the goal of forbidding emulated
RMA; change these to use config.modifyAlways("IB_TX_INLINE_RESP", "0") and
config.modifyAlways("PROTO_EMULATION_ENABLE", "n") in ucx_utils.cpp, and update
the engine_config parsing loop so that if a key equals "PROTO_EMULATION_ENABLE"
or "IB_TX_INLINE_RESP" it calls config.modifyAlways(...) (not
config.modify(...)) to prevent re-enabling via engine_config or env vars while
leaving other keys unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/plugins/ucx/ucx_utils.cpp`:
- Around line 426-430: The calls to config.modify("IB_TX_INLINE_RESP", "0") and
config.modify("PROTO_EMULATION_ENABLE", "n") silently no-op when UCX_* env vars
or engine_config override is present, defeating the goal of forbidding emulated
RMA; change these to use config.modifyAlways("IB_TX_INLINE_RESP", "0") and
config.modifyAlways("PROTO_EMULATION_ENABLE", "n") in ucx_utils.cpp, and update
the engine_config parsing loop so that if a key equals "PROTO_EMULATION_ENABLE"
or "IB_TX_INLINE_RESP" it calls config.modifyAlways(...) (not
config.modify(...)) to prevent re-enabling via engine_config or env vars while
leaving other keys unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 66173add-09ca-4de1-bdf5-e7e76a4bef69

📥 Commits

Reviewing files that changed from the base of the PR and between 1fbe3e9 and 7b405bf.

📒 Files selected for processing (1)
  • src/plugins/ucx/ucx_utils.cpp

@brminich
Copy link
Copy Markdown
Contributor

brminich commented May 4, 2026

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants