Skip to content

Fix thread not joined#1080

Merged
BillyONeal merged 3 commits intomicrosoft:masterfrom
whoan:fix-unhandled-exception
Mar 26, 2019
Merged

Fix thread not joined#1080
BillyONeal merged 3 commits intomicrosoft:masterfrom
whoan:fix-unhandled-exception

Conversation

@whoan
Copy link
Copy Markdown
Contributor

@whoan whoan commented Mar 25, 2019

I run into a std::terminate issue on Linux and I found there's a race condition between the creation of the thread that starts ASIO io_service, and the actual assignation of the thread object to m_thread. The new thread can try to connect and call fail_handler before the thread object is assigned. In this scenario, fail_handler will not join m_thread as it is still not joinable.

I modified the template method connect_impl in src/websockets/client/ws_client_wspp.cpp to add some delay and reproduce the issue instead of using gdb but notice you can find easier to do it with said debugger. These are the modifications:

        auto tmpThread = std::thread([&client]() {
#if defined(__ANDROID__)
            crossplat::get_jvm_env();
#endif
            client.run();
#if defined(__ANDROID__)
            crossplat::JVM.load()->DetachCurrentThread();
#endif

#if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER)
            // OpenSSL stores some per thread state that never will be cleaned up until
            // the dll is unloaded. If static linking, like we do, the state isn't cleaned up
            // at all and will be reported as leaks.
            // See http://www.openssl.org/support/faq.html#PROG13
            ERR_remove_thread_state(nullptr);
#endif
        });
        // my delay HERE
        std::this_thread::sleep_for(std::chrono::seconds(1));
        m_thread = std::move(tmpThread);

And with the following snippet I could reproduce the not joined thread:

#include <iostream>
#include <thread>
#include <pplx/pplxtasks.h>
#include <cpprest/ws_client.h>

// this is to avoid a data race in the scheduler creation reported by Google's thread sanitizer
auto scheduler = pplx::get_ambient_scheduler();

void handleTerminate() {
  std::cerr << "Unhandled exception or thread not joined\n";
  quick_exit(EXIT_FAILURE);
}

int main() {
  std::set_terminate(handleTerminate);
  auto ws = web::websockets::client::websocket_callback_client();
  try {
    ws.connect("ws://an_invalid_host").wait();
  } catch (const std::exception& ex) {
    std::cerr << "Ws connection error: " << ex.what() << std::endl;
  }
  return 0;
}

To test the changes in this PR, you can use the custom lib code (with the delay) but with the corresponding locks. If you run the snippet again, you will see the problem is gone.

If you want a more realistic scenario, I reproduced the problem in an application which creates lots of websocket_callback_clients at the same time and all of them fail at connecting to the endpoint.


This PR might fix #32 and #701

@whoan whoan changed the title Fix unhandled exception Fix thread not joined Mar 25, 2019
@msftclas
Copy link
Copy Markdown

msftclas commented Mar 26, 2019

CLA assistant check
All CLA requirements met.

@BillyONeal
Copy link
Copy Markdown
Member

I'm not sure why the CLA bot was happy with your other PR but not this one?

@BillyONeal
Copy link
Copy Markdown
Member

Thanks for your contribution!

@BillyONeal BillyONeal merged commit 275b9c1 into microsoft:master Mar 26, 2019
@whoan whoan deleted the fix-unhandled-exception branch March 26, 2019 19:18
@bin0101
Copy link
Copy Markdown

bin0101 commented Mar 27, 2019

Hi @BillyONeal, it's a great fix! We are also encountering the std::terminate issue on windows. Do you think there may be other case which cause missing calling the m_thread::join()? If so, will it be good idea to add a guard in ~wspp_callback_client(), to check m_thread joinable?

@whoan
Copy link
Copy Markdown
Contributor Author

whoan commented Mar 27, 2019

Hi @bin0101. Notice that a thread not joined is not the only reason to trigger std::terminate. Check the link to see possible causes.

A common scenario (mis)using pplx tasks is to have unhandled exceptions. Something like this could help to find the reason of the std::terminate call:

void handleTerminate() {
  try {
    if (!std::current_exception()) {
      throw int();  // throw anything different than std::exception here
    }
    std::rethrow_exception(std::current_exception());
  } catch (const std::exception& ex) {
    std::cerr << "Unhandled exception: " << ex.what() << std::endl;
  } catch (...) {
    std::cerr << "A thread might not be joined" << std::endl;
  }
  quick_exit(EXIT_FAILURE);
}

std::set_terminate(handleTerminate);

@bin0101
Copy link
Copy Markdown

bin0101 commented Mar 27, 2019

@whoan thanks for your explanation, but my point is the problem that the thread not joined may have different case. Your current PR fix one of the cases, but I am afraid there are other case. So I suggest that we just add a thread joinable check in ~wspp_callback_client() to avoid crash due to such case.

Sorry, I mentioned the wrong person.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Double delete in wspp_callback_client destructor?

4 participants