Shrink the inline storage in connect_awaitable#1974
Shrink the inline storage in connect_awaitable#1974ispeters wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
This diff adapts an idea originally due to @lewissbaker, originally shared at https://godbolt.org/z/zGG9fsPrz. Changes to Lewis's original include: * never invoke `coro.destroy()` since it's a no-op anyway * hard-code the integration with `connect_awaitable` since that's all we're using it for anyway * integrate with stdexec's support for various ways of making senders awaitable * MOAR `std::unreachable()` * support `unhandled_stopped()`
| // the first implementation of storing the coroutine frame inline in __opstate using the | ||
| // technique in this file is due to Lewis Baker <lewissbaker@gmail.com>, and was first | ||
| // shared at https://godbolt.org/z/zGG9fsPrz |
There was a problem hiding this comment.
@lewissbaker, do have any feedback on this? Placement, content, whatever?
| STDEXEC_PRAGMA_OPTIMIZE_BEGIN() | ||
|
|
||
| static constexpr std::size_t __storage_size = 8 * sizeof(void*); | ||
| static constexpr std::size_t __storage_size = 5 * sizeof(void*); |
There was a problem hiding this comment.
I don't know why this needs to be 5 * sizeof(void*); I'd expect 4 * sizeof(void*).
There was a problem hiding this comment.
Staring at some disassembly, it looks like we need 5 * sizeof(void*) because the coroutine frame has three function pointers in it: … (.resume), …(.cleanup), and …(.destroy). Lewis's original Godbolt didn't have …(.cleanup); Gemini suggests …(.cleanup) gets emitted when the compiler applies the HALO optimization—it differs from …(.destroy) in that it doesn't deallocate memory. It seems we need to deceive the compiler into not doing HALO to be able to shrink the inline storage further.
There was a problem hiding this comment.
Usually the compiler stores in the destroy-slot of the coroutine frame either .cleanup or .destroy depending on whether the allocation was optimised out with HALO. i.e. if it did elide the allocation then it sets the destroy function pointer to point at the impl that does not call operator delete to free the memory, otherwise if it didn't elide the allocation then it sets the destroy function-pointer to point to the impl that does call operator delete to free the memory.
The existence of both .destroy and .cleanup methods probably just means that in some places this function is being HALO'd and in other places it is not.
| // Returning noop_coroutine here causes the __connect_awaitable | ||
| // coroutine to never resume past the point where it co_await's | ||
| // the awaitable. |
There was a problem hiding this comment.
I should clean this up…
| { | ||
| static_assert(__std::convertible_to<__suspend_result_t, __std::coroutine_handle<>>); | ||
| auto __resume_target = __awaiter_.await_suspend(__coro); | ||
| __resume_target.resume(); |
There was a problem hiding this comment.
Technically, this is allowed to throw, and it's never marked noexcept so we can't determine statically if it will or not. The code as-is will swallow any such exception and never complete the operation represented by __opstate, which is UB. We should do something about that—choices seem to include catching the exception and completing with set_error, or making the UB explicit by invoking unreachable() here.
There was a problem hiding this comment.
i like the idea of using unreachable() for this
* `s/try/STDEXEC_TRY/g` * `s/catch (...)/STDEXEC_CATCH_ALL/g` * `s/__coro/&_/g`
|
/ok to test 4f59b3c |
|
so, it looks like CI has been vacuously green for a couple of weeks, and in that time i managed to break coroutine support for msvc. i can't fix the CI issue until i get the coro tests passing on msvc again. i'll try to get to that soon. sorry for the delay. |
This diff adapts an idea originally due to @lewissbaker, originally shared at https://godbolt.org/z/zGG9fsPrz.
Changes to Lewis's original include:
coro.destroy()since it's a no-op anywayconnect_awaitablesince that's all we're using it for anywaystd::unreachable()unhandled_stopped()