Environment details
Any OS, any Python version, google-api-core==1.9.0.
Steps to reproduce
No actual steps, spotted this in bidi.py while working on a different issue. This is how the "Am I paused?" snippet in BackgroundConsumer._thread_main() looks like:
with self._wake:
if self._paused:
_LOGGER.debug("paused, waiting for waking.")
self._wake.wait()
_LOGGER.debug("woken.")
_LOGGER.debug("waiting for recv.")
response = self._bidi_rpc.recv()
The _paused state can be set / cleared by the pause() and resume() methods, respectively.
If paused, the code snippet from above blocks at the _wake.wait() call, and is unblocked some time after _wake.notifyAll() is invoked in the resume() method. When resume() method notifies the waiting threads and releases the internal lock held by the self._wake condition, _wake.wait() tries to re-obtain that lock.
Now, suppose that some other thread invokes the pause() method in the meantime, and the latter obtains the self._wake's lock before _wake.wait() can grab it. The _paused flag will again be set to True, and when _wake.wait() finally acquires the lock and resumes, the code will invoke _bidi_rpc.recv() in the paused state.
For this reason the self._paused condition must be checked in a loop, and not in a single if statement, as the docs on threading.Condition state:
The while loop checking for the application’s condition is necessary because wait() can return after an arbitrary long time, and the condition which prompted the notify() call may no longer hold true. This is inherent to multi-threaded programming.
How much of a problem this is, i.e. invoking recv() in a paused state?
Edit:
The same can happen when exiting the with self._wake block. The lock gets released, pause() can acquire it and set _self._paused to True, but _bidi_rpc.recv() will nevertheless be called, because it is placed outside the with block.
Code example
I created a demo script that demonstrates why if <condition> is not sufficient, and subject to race conditions (requires Python 3.6+). The "worker" thread waits until a shared global number is set to 10, while five "changer threads" randomly change that number, and notify other threads if they change it to 10.
Sometimes the "worker" is invoked too late, and another "changer" thread changes 10 to something else, causing the "worker" to continue running when the target condition is no longer true. Replacing if not <condition> with while not <condition> gets rid of the bug.
...
INFO [2019-04-29 23:09:20,327] Thread-Changer-0: Changing number from 9 to 10
INFO [2019-04-29 23:09:20,327] Thread-Changer-0: New number == 10, notifying others
INFO [2019-04-29 23:09:20,327] Thread-Worker : I see the number 10
INFO [2019-04-29 23:09:20,327] Thread-Changer-1: Changing number from 10 to 4
INFO [2019-04-29 23:09:20,327] Thread-Changer-4: Changing number from 4 to 10
INFO [2019-04-29 23:09:20,327] Thread-Changer-4: New number == 10, notifying others
INFO [2019-04-29 23:09:20,328] Thread-Changer-2: Changing number from 10 to 3
INFO [2019-04-29 23:09:21,328] Thread-Changer-3: Changing number from 3 to 9
INFO [2019-04-29 23:09:21,328] Thread-Changer-1: Changing number from 9 to 6
INFO [2019-04-29 23:09:21,328] Thread-Changer-4: Changing number from 6 to 4
INFO [2019-04-29 23:09:21,328] Thread-Worker : the number is not 10, will wait for condition
INFO [2019-04-29 23:09:21,328] Thread-Changer-0: Changing number from 4 to 6
INFO [2019-04-29 23:09:21,329] Thread-Changer-2: Changing number from 6 to 9
INFO [2019-04-29 23:09:22,329] Thread-Changer-3: Changing number from 9 to 10
INFO [2019-04-29 23:09:22,329] Thread-Changer-3: New number == 10, notifying others
INFO [2019-04-29 23:09:22,329] Thread-Changer-1: Changing number from 10 to 5
INFO [2019-04-29 23:09:22,329] Thread-Worker : I see the number 5
...
The expected behavior is that the worker thread always prints out "I see the number 10" (the desired condition), and never "I see <non 10>". In other words, it should only proceed when the condition number == 10 is currently fulfilled.
In the bidi.py case, this would translate to BackgroundConsumer._thread_main() only resuming when self._paused == False, and never resuming when self._paused == True.
Environment details
Any OS, any Python version, google-api-core==1.9.0.
Steps to reproduce
No actual steps, spotted this in bidi.py while working on a different issue. This is how the "Am I paused?" snippet in
BackgroundConsumer._thread_main()looks like:The
_pausedstate can be set / cleared by the pause() and resume() methods, respectively.If paused, the code snippet from above blocks at the
_wake.wait()call, and is unblocked some time after _wake.notifyAll() is invoked in theresume()method. Whenresume()method notifies the waiting threads and releases the internal lock held by theself._wakecondition,_wake.wait()tries to re-obtain that lock.Now, suppose that some other thread invokes the
pause()method in the meantime, and the latter obtains theself._wake's lock before_wake.wait()can grab it. The_pausedflag will again be set toTrue, and when_wake.wait()finally acquires the lock and resumes, the code will invoke_bidi_rpc.recv()in the paused state.For this reason the
self._pausedcondition must be checked in a loop, and not in a singleifstatement, as the docs on threading.Condition state:How much of a problem this is, i.e. invoking
recv()in a paused state?Edit:
The same can happen when exiting the
with self._wakeblock. The lock gets released,pause()can acquire it and set_self._pausedtoTrue, but_bidi_rpc.recv()will nevertheless be called, because it is placed outside thewithblock.Code example
I created a demo script that demonstrates why
if <condition>is not sufficient, and subject to race conditions (requires Python 3.6+). The "worker" thread waits until a shared global number is set to 10, while five "changer threads" randomly change that number, and notify other threads if they change it to 10.Sometimes the "worker" is invoked too late, and another "changer" thread changes 10 to something else, causing the "worker" to continue running when the target condition is no longer true. Replacing
if not <condition>withwhile not <condition>gets rid of the bug.The expected behavior is that the worker thread always prints out "I see the number 10" (the desired condition), and never "I see <non 10>". In other words, it should only proceed when the condition
number == 10is currently fulfilled.In the
bidi.pycase, this would translate toBackgroundConsumer._thread_main()only resuming whenself._paused == False, and never resuming whenself._paused == True.