Conversation
|
I'm not sure why the CLA bot was happy with your other PR but not this one? |
|
Thanks for your contribution! |
|
Hi @BillyONeal, it's a great fix! We are also encountering the std::terminate issue on windows. Do you think there may be other case which cause missing calling the m_thread::join()? If so, will it be good idea to add a guard in ~wspp_callback_client(), to check m_thread joinable? |
|
Hi @bin0101. Notice that a thread not joined is not the only reason to trigger A common scenario (mis)using pplx tasks is to have unhandled exceptions. Something like this could help to find the reason of the void handleTerminate() {
try {
if (!std::current_exception()) {
throw int(); // throw anything different than std::exception here
}
std::rethrow_exception(std::current_exception());
} catch (const std::exception& ex) {
std::cerr << "Unhandled exception: " << ex.what() << std::endl;
} catch (...) {
std::cerr << "A thread might not be joined" << std::endl;
}
quick_exit(EXIT_FAILURE);
}
std::set_terminate(handleTerminate); |
|
@whoan thanks for your explanation, but my point is the problem that the thread not joined may have different case. Your current PR fix one of the cases, but I am afraid there are other case. So I suggest that we just add a thread joinable check in ~wspp_callback_client() to avoid crash due to such case. Sorry, I mentioned the wrong person. |
I run into a std::terminate issue on Linux and I found there's a race condition between the creation of the thread that starts ASIO io_service, and the actual assignation of the thread object to
m_thread. The new thread can try to connect and call fail_handler before the thread object is assigned. In this scenario, fail_handler will not joinm_threadas it is still not joinable.I modified the template method
connect_implin src/websockets/client/ws_client_wspp.cpp to add some delay and reproduce the issue instead of usinggdbbut notice you can find easier to do it with said debugger. These are the modifications:And with the following snippet I could reproduce the not joined thread:
To test the changes in this PR, you can use the custom lib code (with the delay) but with the corresponding locks. If you run the snippet again, you will see the problem is gone.
If you want a more realistic scenario, I reproduced the problem in an application which creates lots of
websocket_callback_clients at the same time and all of them fail at connecting to the endpoint.This PR might fix #32 and #701