Shrink the inline storage in connect_awaitable#1974

Open

ispeters wants to merge 3 commits intoNVIDIA:mainfrom

ispeters:inline_coroutine_allocation

Contributor

ispeters commented Mar 30, 2026

This diff adapts an idea originally due to @lewissbaker, originally shared at https://godbolt.org/z/zGG9fsPrz.

Changes to Lewis's original include:

never invoke coro.destroy() since it's a no-op anyway
hard-code the integration with connect_awaitable since that's all we're using it for anyway
integrate with stdexec's support for various ways of making senders awaitable
MOAR std::unreachable()
support unhandled_stopped()


          Shrink the inline storage in connect_awaitable

ccc58c4

This diff adapts an idea originally due to @lewissbaker, originally
shared at https://godbolt.org/z/zGG9fsPrz.

Changes to Lewis's original include:
 * never invoke `coro.destroy()` since it's a no-op anyway
 * hard-code the integration with `connect_awaitable` since that's all
   we're using it for anyway
 * integrate with stdexec's support for various ways of making senders
   awaitable
 * MOAR `std::unreachable()`
 * support `unhandled_stopped()`

copy-pr-bot bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.


          Credit Lewis in the source, too

9421cd5

ispeters commented

View reviewed changes

include/stdexec/__detail/__connect_awaitable.hpp

Comment on lines +125 to +127

+                      // the first implementation of storing the coroutine frame inline in __opstate using the
+                      // technique in this file is due to Lewis Baker <lewissbaker@gmail.com>, and was first
+                      // shared at https://godbolt.org/z/zGG9fsPrz

Contributor Author

ispeters Mar 30, 2026

@lewissbaker, do have any feedback on this? Placement, content, whatever?

ispeters commented

View reviewed changes

include/stdexec/__detail/__connect_awaitable.hpp

-                  STDEXEC_PRAGMA_OPTIMIZE_BEGIN()
-                  static constexpr std::size_t __storage_size  = 8 * sizeof(void*);
+                  static constexpr std::size_t __storage_size  = 5 * sizeof(void*);

Contributor Author

ispeters Mar 30, 2026

I don't know why this needs to be 5 * sizeof(void*); I'd expect 4 * sizeof(void*).

Contributor Author

ispeters Mar 31, 2026

Staring at some disassembly, it looks like we need 5 * sizeof(void*) because the coroutine frame has three function pointers in it: … (.resume), …(.cleanup), and …(.destroy). Lewis's original Godbolt didn't have …(.cleanup); Gemini suggests …(.cleanup) gets emitted when the compiler applies the HALO optimization—it differs from …(.destroy) in that it doesn't deallocate memory. It seems we need to deceive the compiler into not doing HALO to be able to shrink the inline storage further.

Contributor

lewissbaker Mar 31, 2026

Usually the compiler stores in the destroy-slot of the coroutine frame either .cleanup or .destroy depending on whether the allocation was optimised out with HALO. i.e. if it did elide the allocation then it sets the destroy function pointer to point at the impl that does not call operator delete to free the memory, otherwise if it didn't elide the allocation then it sets the destroy function-pointer to point to the impl that does call operator delete to free the memory.

The existence of both .destroy and .cleanup methods probably just means that in some places this function is being HALO'd and in other places it is not.

ispeters commented

View reviewed changes

include/stdexec/__detail/__connect_awaitable.hpp

Comment on lines 171 to 173

                       // Returning noop_coroutine here causes the __connect_awaitable
                       // coroutine to never resume past the point where it co_await's
                       // the awaitable.

Contributor Author

ispeters Mar 30, 2026

I should clean this up…

ericniebler reviewed

View reviewed changes

Collaborator

ericniebler left a comment

looks great!

include/stdexec/__detail/__connect_awaitable.hpp Outdated Show resolved Hide resolved

include/stdexec/__detail/__connect_awaitable.hpp Outdated Show resolved Hide resolved

include/stdexec/__detail/__connect_awaitable.hpp Outdated Show resolved Hide resolved

ispeters commented

View reviewed changes

include/stdexec/__detail/__connect_awaitable.hpp

+                          {
+                            static_assert(__std::convertible_to<__suspend_result_t, __std::coroutine_handle<>>);
+                            auto __resume_target = __awaiter_.await_suspend(__coro);
+                            __resume_target.resume();

Contributor Author

ispeters Mar 30, 2026

Technically, this is allowed to throw, and it's never marked noexcept so we can't determine statically if it will or not. The code as-is will swallow any such exception and never complete the operation represented by __opstate, which is UB. We should do something about that—choices seem to include catching the exception and completing with set_error, or making the UB explicit by invoking unreachable() here.

Collaborator

ericniebler Mar 31, 2026

i like the idea of using unreachable() for this


          Code review feedback

4f59b3c

* `s/try/STDEXEC_TRY/g`
* `s/catch (...)/STDEXEC_CATCH_ALL/g`
* `s/__coro/&_/g`

Collaborator

ericniebler commented Mar 30, 2026

/ok to test 4f59b3c

Collaborator

ericniebler commented Mar 31, 2026

so, it looks like CI has been vacuously green for a couple of weeks, and in that time i managed to break coroutine support for msvc. i can't fix the CI issue until i get the coro tests passing on msvc again. i'll try to get to that soon.

sorry for the delay.

Ari4ka approved these changes

View reviewed changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet