Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader/writer model locking #315

Open
garethsb opened this issue Jan 11, 2023 · 0 comments
Open

Reader/writer model locking #315

garethsb opened this issue Jan 11, 2023 · 0 comments

Comments

@garethsb
Copy link
Contributor

garethsb commented Jan 11, 2023

I hit a crash in nmos-cpp-registry when load testing...

Config:

  • Windows 10, VS 2019
  • Boost 1.80.0

I launched 400 Nodes with nmos-multi-node on same host as the Registry, along with an NMOS Controller using the Query API WebSocket subscriptions. I thought the Registration API request processing would be throttled by the size of the thread pools for http_listener and pplx, but possibly not (needs more investigation), and ending up hitting the exclusive locker limit of boost::shared_mutex, which results in an uncaught boost::lock_error exception (and the mutex state being incorrect, so just catching the exception isn't going to help).

http:https://stackoverflow.com/questions/69580941/why-boostshared-mutex-cannot-block-more-than-128-threads

Before we even hit the uncaught boost::lock_error exception, in nmos::send_query_ws_events_thread, there are several non-crash errors caused by the same issue but in places where the exception is caught and logged as an "Implementation error" and causes "HTTP error: An operation was attempted on a nonexistent network connection. [windows:1229]" presumably because a registration request never got a response.

Possible workarounds for the crash itself involve replacing current boost::shared_mutex with an alternative...
std::shared_mutex (but that requires C++17), std::shared_timed_mutex (that requires C++14), BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN and (probably) BOOST_THREAD_V2_SHARED_MUTEX (requires rebuild/nonstandard boost config), std::mutex (means no reader locks, so needs further testing to prove no deadlocks before even getting to performance profiling, though this is exactly what we had before the switch to use shared_mutex, and the fact that the crash is happening suggests that the profile of model reads and writes that we're seeing may not actually benefit from shared_mutex)

FWIW, this issue has presumably been present since 2018... 9bca76a

However, after experimenting on the different ways to resolve this, with limited success, I realised I'd only been trying Debug builds of the Registry... all the workarounds I tried performed perfectly in Release builds... but then so does the current shared_mutex approach. So for now, I'm just leaving this issue here as a reminder, if boost::lock_error rears its ugly head again...

(Tangentially relevant: a useful post which explains clearly why it only makes sense to have one upgradable lock at a time... https://stackoverflow.com/questions/59710699/upgrade-shared-lock-to-unique-lock-usage-timing-and-design
And the paper which also describes it... https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3568.html#OverviewUpgrade)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant