Skip to content

Shrink the inline storage in connect_awaitable#1974

Open
ispeters wants to merge 3 commits intoNVIDIA:mainfrom
ispeters:inline_coroutine_allocation
Open

Shrink the inline storage in connect_awaitable#1974
ispeters wants to merge 3 commits intoNVIDIA:mainfrom
ispeters:inline_coroutine_allocation

Conversation

@ispeters
Copy link
Copy Markdown
Contributor

This diff adapts an idea originally due to @lewissbaker, originally shared at https://godbolt.org/z/zGG9fsPrz.

Changes to Lewis's original include:

  • never invoke coro.destroy() since it's a no-op anyway
  • hard-code the integration with connect_awaitable since that's all we're using it for anyway
  • integrate with stdexec's support for various ways of making senders awaitable
  • MOAR std::unreachable()
  • support unhandled_stopped()

This diff adapts an idea originally due to @lewissbaker, originally
shared at https://godbolt.org/z/zGG9fsPrz.

Changes to Lewis's original include:
 * never invoke `coro.destroy()` since it's a no-op anyway
 * hard-code the integration with `connect_awaitable` since that's all
   we're using it for anyway
 * integrate with stdexec's support for various ways of making senders
   awaitable
 * MOAR `std::unreachable()`
 * support `unhandled_stopped()`
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment on lines +125 to +127
// the first implementation of storing the coroutine frame inline in __opstate using the
// technique in this file is due to Lewis Baker <lewissbaker@gmail.com>, and was first
// shared at https://godbolt.org/z/zGG9fsPrz
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewissbaker, do have any feedback on this? Placement, content, whatever?

STDEXEC_PRAGMA_OPTIMIZE_BEGIN()

static constexpr std::size_t __storage_size = 8 * sizeof(void*);
static constexpr std::size_t __storage_size = 5 * sizeof(void*);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this needs to be 5 * sizeof(void*); I'd expect 4 * sizeof(void*).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staring at some disassembly, it looks like we need 5 * sizeof(void*) because the coroutine frame has three function pointers in it: … (.resume), …(.cleanup), and …(.destroy). Lewis's original Godbolt didn't have …(.cleanup); Gemini suggests …(.cleanup) gets emitted when the compiler applies the HALO optimization—it differs from …(.destroy) in that it doesn't deallocate memory. It seems we need to deceive the compiler into not doing HALO to be able to shrink the inline storage further.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the compiler stores in the destroy-slot of the coroutine frame either .cleanup or .destroy depending on whether the allocation was optimised out with HALO. i.e. if it did elide the allocation then it sets the destroy function pointer to point at the impl that does not call operator delete to free the memory, otherwise if it didn't elide the allocation then it sets the destroy function-pointer to point to the impl that does call operator delete to free the memory.

The existence of both .destroy and .cleanup methods probably just means that in some places this function is being HALO'd and in other places it is not.

Comment on lines 171 to 173
// Returning noop_coroutine here causes the __connect_awaitable
// coroutine to never resume past the point where it co_await's
// the awaitable.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clean this up…

Copy link
Copy Markdown
Collaborator

@ericniebler ericniebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

{
static_assert(__std::convertible_to<__suspend_result_t, __std::coroutine_handle<>>);
auto __resume_target = __awaiter_.await_suspend(__coro);
__resume_target.resume();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this is allowed to throw, and it's never marked noexcept so we can't determine statically if it will or not. The code as-is will swallow any such exception and never complete the operation represented by __opstate, which is UB. We should do something about that—choices seem to include catching the exception and completing with set_error, or making the UB explicit by invoking unreachable() here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the idea of using unreachable() for this

* `s/try/STDEXEC_TRY/g`
* `s/catch (...)/STDEXEC_CATCH_ALL/g`
* `s/__coro/&_/g`
@ericniebler
Copy link
Copy Markdown
Collaborator

/ok to test 4f59b3c

@ericniebler
Copy link
Copy Markdown
Collaborator

so, it looks like CI has been vacuously green for a couple of weeks, and in that time i managed to break coroutine support for msvc. i can't fix the CI issue until i get the coro tests passing on msvc again. i'll try to get to that soon.

sorry for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants