patternMinor

Mutex implementation on top of a minimalistic preemptive scheduler

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

synchronization operating-systems threads cs stackoverflow concurrency mutual-exclusion

schedulerminimalisticimplementationmutextoppreemptive

Problem

In this question, I'm implementing some synchronization primitives in the standard library of an operating system. Specifically, I want to implement mutexes and condition variables. This is on top of a microkernel with a preemptive scheduler (a thread has no way to prevent the scheduler from preempting it). I'm exploring how minimalistic the system can get while remaining powerful and efficient enough for my needs.

From the hardware, I get a basic atomic primitive such as compare-and-swap or load-link/store-conditional. You can assume that the word size for these instructions is large enough to store a pointer or a thread identifier. Furthermore all threads see the same memory.

From the kernel, I get three primitives:

sleep(): suspend the current thread until a thread has called wake(t) where t is the current thread. If the current thread's wake-up flag is already set then sleep() returns immediately. Returning from sleep clears the wake-up flag.

wake(t): wake thread t if it's asleep. If t isn't asleep then its wake-up flag is set and t's next call to sleep will return immediately. Note that non-participating threads may call wake — as far as the locking mechanism is concerned, it's thus possible for a thread's wake flag to become set non-deterministically.

yield(): give all other threads that are not asleep a chance to run.

How can I implement a mutex and condition variable API on top of these primitives?

lock(m): if m is taken, sleep until it is free. Mark m as taken.

unlock(m): mark m as free. If some other thread is waiting for it to become free, that thread is woken up.

wait(c, m): wait for the event c to happen. If c has already happened, don't block. When wait returns, it atomically locks the mutex m. When a thread notices c, the event is cleared.

signal(c): signal an event on the condition c. This wakes up one thread that's waiting on c.

A good solution must avoid starvation: if a t

Solution

Mutex

For the mutex I'm going to propose a modification of the MCS spinlock that uses your sleep/wake primitives to block instead of spin while waiting to acquire the mutex. The MCS lock was introduced in

Mellor-Crummey, John M; Scott, Michael L: Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors, ACM Trans. on Comp. Sys. (TOCS), 9(1):21-65, 1991.

The MCS lock was specifically designed to eliminate the undesirable coherence traffic associated with spin locks in large-scale cache-coherent shared memory multiprocessors (which is not your concern) but I'm suggesting it because of the analogy between inter-processor coherence messaging in multiprocessors and your sleep and wake primitives.

Essentially we are keeping a linked-list of waiters for the lock. The lock itself is a pointer to the tail of the list. The head of the list is the thread that currently has the lock. The tricky part is that there are two pointers to the tail so you need to be careful about the lack of atomicity when adding or removing the tail node.

I'm going to use the same pseudo-code notation used in the paper, which is Pascal-based (it was 1991, give them a break), so the pointer notation may be unfamiliar to many readers. I: ^qnode means, "the variable I has the type 'pointer to qnode'," and I^ means, "that which I points to."

compare_and_swap(addr, old, new) is defined as (atomically) if addr^ != old return false; addr^ := new; return true. atomic_swap(addr, new) is defined as (atomically) old := addr^; addr^ := new; return old.

type QNode = record
  owner    : Boolean
  threadId : ThreadIdentifier
  next     : ^QNode

type lock = ^QNode    // locks start initialized to "nil" (the null pointer)

// The qnode I needs to be allocated in an outer scope

procedure acquire_lock (L : ^lock, I : ^qnode)
  I->threadId := myThreadId
  I->next     := nil
  predecessor : ^qnode := atomic_swap(L, I)
  if predecessor != nil                    // queue was non-empty
     I->owner := false
     predecessor->next := I                // put myself on the tail
     repeat sleep() while not I->owner

procedure release_lock (L : ^lock, I : ^qnode)
  if I->next == nil                        // no known successor
     if compare_and_swap(L, I, nil) return true  // L agrees that there's no successor
     else repeat (spin) while I->next == nil     // else spin wait for race to clear itself
  I->next->owner := true                   // there is a successor, wake it
  wake(I->next->threadId)

The only spinning is (1) in the (hopefully extremely rare) condition that the queue is empty except for the lock holder, and the lock holder tries to release at exactly the same time some other thread is trying to add itself to the queue or (2) when a thread waiting to acquire the lock receives a spurious wake up.

Condition Variable

Given a working mutex implementation with lock(m) and unlock(m), it is possible to do a relatively naïve but working implementation of a condition variable. This design is based on

Birrell, Andrew D; Impliementing Condition Variables with Semaphores, in Computer Systems, Theory, Technology, and Applications, Andrew Herbert and Karen Spärck Jones, editors, pp. 29-37, Springer Monographs in Computer Science, 2004.

and in particular, Birrell's advice that the condition variable should do the queueing itself, rather than relying on some other primitive. Birrell's version additionally makes use of the observation that each thread can be waiting for at most one condition variable, so each qnode can be allocated as part of the thread control block when the thread is created. (Which seems a bad idea to me, because then you have a dangling pointer when a thread gets killed while waiting for a condition variable.)

Start with a non-atomic queue of integers data type with push_tail(q, t) and pop_head(q) and is_empty(q).

type CVar = record
  q : Queue
  qm : Mutex   // private mutex to protect the queue

// precondition: caller holds mutex m    
procedure wait(cv : ^CVar, m : ^Mutex)
  lock(cv->qm)
  push_tail(cv->q, myThreadId)   // atomically add caller to the queue
  unlock(cv->qm)
  unlock(m)
  sleep()
  lock(m)

procedure signal(cv : ^CVar)
  lock(cv->qm)
  if not is_empty(cv->q)
    wake(pop_head(cv->q))
  unlock(cv->qm)

This implementation does suffer from spurious wakeups, but that's standard for condition variable implementations. It is assumed that you are using condition variables to wait for some precondition (described by predicate P()) to become true and that you are thus using the condition variable like this:

m  : Mutex
cv : CVar

lock(m)
while not P()
  wait(cv, m)
// at this point P() is true and m is acquired

Code Snippets

type QNode = record
  owner    : Boolean
  threadId : ThreadIdentifier
  next     : ^QNode

type lock = ^QNode    // locks start initialized to "nil" (the null pointer)

// The qnode I needs to be allocated in an outer scope

procedure acquire_lock (L : ^lock, I : ^qnode)
  I->threadId := myThreadId
  I->next     := nil
  predecessor : ^qnode := atomic_swap(L, I)
  if predecessor != nil                    // queue was non-empty
     I->owner := false
     predecessor->next := I                // put myself on the tail
     repeat sleep() while not I->owner

procedure release_lock (L : ^lock, I : ^qnode)
  if I->next == nil                        // no known successor
     if compare_and_swap(L, I, nil) return true  // L agrees that there's no successor
     else repeat (spin) while I->next == nil     // else spin wait for race to clear itself
  I->next->owner := true                   // there is a successor, wake it
  wake(I->next->threadId)

type CVar = record
  q : Queue
  qm : Mutex   // private mutex to protect the queue

// precondition: caller holds mutex m    
procedure wait(cv : ^CVar, m : ^Mutex)
  lock(cv->qm)
  push_tail(cv->q, myThreadId)   // atomically add caller to the queue
  unlock(cv->qm)
  unlock(m)
  sleep()
  lock(m)

procedure signal(cv : ^CVar)
  lock(cv->qm)
  if not is_empty(cv->q)
    wake(pop_head(cv->q))
  unlock(cv->qm)

m  : Mutex
cv : CVar

lock(m)
while not P()
  wait(cv, m)
// at this point P() is true and m is acquired

Context

StackExchange Computer Science Q#38274, answer score: 3

Revisions (0)

No revisions yet.