Synopsis

pthread_cond_wait(3) may unexpectedly return for what is called a "spurious wakeup." A program must therefore recheck the condition to verify that pthread_cond_wait returned for legitimate reasons and to call it again if otherwise. Besides the given pthread_cond_t, pthread_cond_wait also waits until the provided mutex is released.

Disambiguating pthread_cond_wait(3)

This article does not aim to explain in detail the POSIX thread condition API, but instead clear up some ambiguity which lingered in my mind when I was learning about this API, namely; What are "spurious wakeups," and how can a condition signaller returns a mutex to a waiting thread?

Let's briefly overview the mutex and the condition API with a simple music player:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

char            *playlist_next;
pthread_cond_t   playlisy_ready;
pthread_mutex_t  playlist_mutex;

void *playlist_push_song (void *arg);
void *playlist_play_next (void *arg);

int main(int argc, char **argv)
{
	pthread_t play_thread, push_thread;
	playlist_ready = (pthread_cond_t) PTHREAD_COND_INITIALIZER;
	playlist_mutex = (pthread_mutex_t) PTHREAD_MUTEX_INITIALIZER;
	playlist_next  = NULL;

	pthread_create(&play_thread, NULL, playlist_play_next, NULL);
	pthread_create(&play_thread, NULL, playlist_push_song, "Max Frost - Withdrawal");
	pthread_join(play_thread, NULL);
	pthread_join(push_thread, NULL);
	return EXIT_SUCCESS;
}

void *playlist_push_song(void *arg)
{
	char *song = *arg;
	pthread_mutex_lock(&playlist_mutex);
	playlist_next = song;
	printf("Song added: %s\n", song);
	pthread_mutex_unlock(&playlist_mutex);
	pthread_cond_signal(&playlist_ready);
	return NULL;
}

void *playlist_play_next(void *arg)
{
	pthread_mutex_lock(&playlist_mutex);
	if (playlist_next == NULL) {
		puts("Playlist empty, waiting");
		pthread_cond_wait(&playlist_ready, &playlist_mutex);
	}

	printf("Now playing: %s\n", playlist_next);
	playlist_next = NULL;
	pthread_mutex_unlock(&playlist_mutex);
	return NULL;
}

This program sets and plays a song stored in playlist_next. Since most music players are interfaced using a GUI, these actions must be capable of running asynchronously, meaning in any order, maybe even at the same time. Hence the use of POSIX threads to emulate such concurrency.

Unfortunately, with concurrency comes the risk of race condition; occurrences that may occur at any time, in orders that may lead to dire consequences (think of reading a variable while another thread is in the middle setting it up. That's what we'd call an undefined behaviour).

Here's why we need mutexes. Mutexes are like gas station bathroom keys; only zero or one customer (thread) may hold a key at any time. The customer must also return said key before any other customer can access the bathroom again. This is achieved by calling pthread_mutex_lock, or, in our analogy, waiting for the key if necessary before taking it for yourself, and then calling pthread_mutex_unlock to return the key to the clerk.

For this reason, both playlist_play_next and playlist_push_song are wrapped in a mutex locking context to prevent one and the other from occupying the bathr-- modifying and using the playlist simultaneously.

However, since these functions runs asynchronously, the program may eventually attempt to play a song before one is available, as is currently the case in our program when we create a thread for playlist_play_next before playlist_push_song. This is where conditions come into play.

In our example, playlist_play_next went into the bathroom and saw that there was no toilet paper left (I'll just stop explaining the analogies from now on). For this reason, playlist_play_next temporarily relinquished the key back to the clerk so that playlist_push_song can enter and change the TP roll. That's exactly what pthread_cond_wait does; it atomically unlocks a given mutex and blocks the thread until a condition is signalled.

Once a song is available, playlist_push_song returns the key and tells the customers that their throne is ready by calling pthread_cond_signal(&playlist_ready) (we will see later on that doing these things in that chosen order may be suboptimal, until then feel free to guess why).

All seems good, but something is lurking in our code...

Spurious Wakeups

Here's an excerpt from pthread_cond_wait(3);

Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.

What the hell? Why would POSIX do that? What's the point of unreliably waiting for a condition? Thankfully, Wikipedia has some answers;

To allow for implementation flexibility in dealing with error conditions and races inside the operating system, condition variables may also be allowed to return from a wait even if not signalled, though it is not clear how many implementations actually do that. [...] The Linux p-thread implementation of condition variables guarantees it will not do that.

Spurious wakeups are a platform-specific issue, and an example of Postel's law in action: be conservative in what you do, and be liberal in what you accept from others. From what I gathered, spurious wakeups allows implementation of the POSIX scheduler to be lousy without breaking the convention. You can't break a broken convention, ain't that right, Postel?

So, how can we fix that? Easy; just replace the if clause with a while loop:

void *playlist_play_next(void *arg)
{
	pthread_mutex_lock(&playlist_mutex);

	if (playlist_next == NULL) {
		puts("Playlist empty, waiting");
		do pthread_cond_wait(&playlist_ready, &playlist_mutex);
		while(playlist_next == NULL);
	}

	printf("Now playing: %s\n", playlist_next);
	playlist_next = NULL;
	pthread_mutex_unlock(&playlist_mutex);
	return NULL;
}

And we're done. Now, any return-to-context is checked to be legitimate, and the function will keep waiting otherwise.

No, we're not done.

Hold on, if a waiting thread atomically unlocks his mutex before waiting for a condition, how can it atomically retrieve control over the mutex after the condition is signaled, if we can't unlock a mutex and signal a condition atomically? The answer turned out to be simple; after a condition is met, the thread simply waits to get its mutex back, as if pthread_cond_wait had turned into pthread_mutex_lock.

Now, remember when I said that unlocking a mutex before signalling a condition wasn't optimal? Here's why; in between unlocking the mutex and signalling the condition, another thread waiting for that mutex may step in and invalidate the condition. When our patient thread is brought back to life, the situation won't resemble anything but, you've guessed it, a spurious wakeup! Therefore, inverting these two steps makes the waiting thread more likely to receive the mutex first;

void *playlist_push_song(void *arg)
{
	char *song = *arg;
	pthread_mutex_lock(&playlist_mutex);
	playlist_next = song;
	printf("Song added: %s\n", song);
	pthread_cond_signal(&playlist_ready);
	pthread_mutex_unlock(&playlist_mutex);
	return NULL;
}