Synopsis
pthread_cond_wait(3)
may unexpectedly return for what is called
a "spurious wakeup." A program must therefore recheck the condition to
verify that pthread_cond_wait
returned for legitimate reasons
and to call it again if otherwise. Besides the given
pthread_cond_t
, pthread_cond_wait also waits until the provided
mutex is released.
Disambiguating pthread_cond_wait(3)
This article does not aim to explain in detail the POSIX thread condition API, but instead clear up some ambiguity which lingered in my mind when I was learning about this API, namely; What are "spurious wakeups," and how can a condition signaller returns a mutex to a waiting thread?
Let's briefly overview the mutex and the condition API with a simple music player:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
char *playlist_next;
pthread_cond_t playlisy_ready;
pthread_mutex_t playlist_mutex;
void *playlist_push_song (void *arg);
void *playlist_play_next (void *arg);
int main(int argc, char **argv)
{
pthread_t play_thread, push_thread;
playlist_ready = (pthread_cond_t) PTHREAD_COND_INITIALIZER;
playlist_mutex = (pthread_mutex_t) PTHREAD_MUTEX_INITIALIZER;
playlist_next = NULL;
pthread_create(&play_thread, NULL, playlist_play_next, NULL);
pthread_create(&play_thread, NULL, playlist_push_song, "Max Frost - Withdrawal");
pthread_join(play_thread, NULL);
pthread_join(push_thread, NULL);
return EXIT_SUCCESS;
}
void *playlist_push_song(void *arg)
{
char *song = *arg;
pthread_mutex_lock(&playlist_mutex);
playlist_next = song;
printf("Song added: %s\n", song);
pthread_mutex_unlock(&playlist_mutex);
pthread_cond_signal(&playlist_ready);
return NULL;
}
void *playlist_play_next(void *arg)
{
pthread_mutex_lock(&playlist_mutex);
if (playlist_next == NULL) {
puts("Playlist empty, waiting");
pthread_cond_wait(&playlist_ready, &playlist_mutex);
}
printf("Now playing: %s\n", playlist_next);
playlist_next = NULL;
pthread_mutex_unlock(&playlist_mutex);
return NULL;
}
This program sets and plays a song stored in playlist_next
. Since
most music players are interfaced using a GUI, these actions must be capable of
running asynchronously, meaning in any order, maybe even at the same time. Hence
the use of POSIX threads to emulate such concurrency.
Unfortunately, with concurrency comes the risk of race condition; occurrences that may occur at any time, in orders that may lead to dire consequences (think of reading a variable while another thread is in the middle setting it up. That's what we'd call an undefined behaviour).
Here's why we need mutexes. Mutexes are like gas station bathroom keys;
only zero or one customer (thread) may hold a key at any time. The customer must
also return said key before any other customer can access the bathroom again.
This is achieved by calling pthread_mutex_lock
, or, in our analogy,
waiting for the key if necessary before taking it for yourself, and then calling
pthread_mutex_unlock
to return the key to the clerk.
For this reason, both playlist_play_next
and
playlist_push_song
are wrapped in a mutex locking context to
prevent one and the other from occupying the bathr-- modifying and using the
playlist simultaneously.
However, since these functions runs asynchronously, the program may eventually
attempt to play a song before one is available, as is currently the case in
our program when we create a thread for playlist_play_next
before
playlist_push_song
. This is where conditions come into play.
In our example, playlist_play_next
went into the bathroom and saw
that there was no toilet paper left (I'll just stop explaining the analogies
from now on). For this reason, playlist_play_next
temporarily
relinquished the key back to the clerk so that playlist_push_song
can enter and change the TP roll. That's exactly what
pthread_cond_wait
does; it atomically unlocks a given mutex and
blocks the thread until a condition is signalled.
Once a song is available, playlist_push_song returns the key and tells the
customers that their throne is ready by calling
pthread_cond_signal(&playlist_ready)
(we will see later on that
doing these things in that chosen order may be suboptimal, until then feel free
to guess why).
All seems good, but something is lurking in our code...
Spurious Wakeups
Here's an excerpt from pthread_cond_wait(3);
Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.
What the hell? Why would POSIX do that? What's the point of unreliably waiting for a condition? Thankfully, Wikipedia has some answers;
To allow for implementation flexibility in dealing with error conditions and races inside the operating system, condition variables may also be allowed to return from a wait even if not signalled, though it is not clear how many implementations actually do that. [...] The Linux p-thread implementation of condition variables guarantees it will not do that.
Spurious wakeups are a platform-specific issue, and an example of Postel's law
in action: be conservative in what you do, and be liberal in what you accept
from others.
From what I gathered, spurious wakeups allows implementation
of the POSIX scheduler to be lousy without breaking the convention. You can't
break a broken convention, ain't that right, Postel?
So, how can we fix that? Easy; just replace the if clause with a while loop:
void *playlist_play_next(void *arg)
{
pthread_mutex_lock(&playlist_mutex);
if (playlist_next == NULL) {
puts("Playlist empty, waiting");
do pthread_cond_wait(&playlist_ready, &playlist_mutex);
while(playlist_next == NULL);
}
printf("Now playing: %s\n", playlist_next);
playlist_next = NULL;
pthread_mutex_unlock(&playlist_mutex);
return NULL;
}
And we're done. Now, any return-to-context is checked to be legitimate, and the function will keep waiting otherwise.
No, we're not done.
Hold on, if a waiting thread atomically unlocks his mutex before waiting for
a condition, how can it atomically retrieve control over the mutex after the
condition is signaled, if we can't unlock a mutex and signal a condition
atomically? The answer turned out to be simple; after a condition is met,
the thread simply waits to get its mutex back, as if
pthread_cond_wait
had turned into pthread_mutex_lock
.
Now, remember when I said that unlocking a mutex before signalling a condition wasn't optimal? Here's why; in between unlocking the mutex and signalling the condition, another thread waiting for that mutex may step in and invalidate the condition. When our patient thread is brought back to life, the situation won't resemble anything but, you've guessed it, a spurious wakeup! Therefore, inverting these two steps makes the waiting thread more likely to receive the mutex first;
void *playlist_push_song(void *arg)
{
char *song = *arg;
pthread_mutex_lock(&playlist_mutex);
playlist_next = song;
printf("Song added: %s\n", song);
pthread_cond_signal(&playlist_ready);
pthread_mutex_unlock(&playlist_mutex);
return NULL;
}