patternMinor
Mutex implementation on top of a minimalistic preemptive scheduler
Viewed 0 times
schedulerminimalisticimplementationmutextoppreemptive
Problem
In this question, I'm implementing some synchronization primitives in the standard library of an operating system. Specifically, I want to implement mutexes and condition variables. This is on top of a microkernel with a preemptive scheduler (a thread has no way to prevent the scheduler from preempting it). I'm exploring how minimalistic the system can get while remaining powerful and efficient enough for my needs.
From the hardware, I get a basic atomic primitive such as compare-and-swap or load-link/store-conditional. You can assume that the word size for these instructions is large enough to store a pointer or a thread identifier. Furthermore all threads see the same memory.
From the kernel, I get three primitives:
How can I implement a mutex and condition variable API on top of these primitives?
A good solution must avoid starvation: if a t
From the hardware, I get a basic atomic primitive such as compare-and-swap or load-link/store-conditional. You can assume that the word size for these instructions is large enough to store a pointer or a thread identifier. Furthermore all threads see the same memory.
From the kernel, I get three primitives:
sleep(): suspend the current thread until a thread has calledwake(t)wheretis the current thread. If the current thread's wake-up flag is already set thensleep()returns immediately. Returning fromsleepclears the wake-up flag.
wake(t): wake threadtif it's asleep. Iftisn't asleep then its wake-up flag is set andt's next call tosleepwill return immediately. Note that non-participating threads may callwake— as far as the locking mechanism is concerned, it's thus possible for a thread's wake flag to become set non-deterministically.
yield(): give all other threads that are not asleep a chance to run.
How can I implement a mutex and condition variable API on top of these primitives?
lock(m): ifmis taken, sleep until it is free. Markmas taken.
unlock(m): markmas free. If some other thread is waiting for it to become free, that thread is woken up.
wait(c, m): wait for the eventcto happen. Ifchas already happened, don't block. Whenwaitreturns, it atomically locks the mutexm. When a thread noticesc, the event is cleared.
signal(c): signal an event on the conditionc. This wakes up one thread that's waiting onc.
A good solution must avoid starvation: if a t
Solution
Mutex
For the mutex I'm going to propose a modification of the MCS spinlock that uses your
Mellor-Crummey, John M; Scott, Michael L: Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors, ACM Trans. on Comp. Sys. (TOCS), 9(1):21-65, 1991.
The MCS lock was specifically designed to eliminate the undesirable coherence traffic associated with spin locks in large-scale cache-coherent shared memory multiprocessors (which is not your concern) but I'm suggesting it because of the analogy between inter-processor coherence messaging in multiprocessors and your
Essentially we are keeping a linked-list of waiters for the lock. The lock itself is a pointer to the tail of the list. The head of the list is the thread that currently has the lock. The tricky part is that there are two pointers to the tail so you need to be careful about the lack of atomicity when adding or removing the tail node.
I'm going to use the same pseudo-code notation used in the paper, which is Pascal-based (it was 1991, give them a break), so the pointer notation may be unfamiliar to many readers.
The only spinning is (1) in the (hopefully extremely rare) condition that the queue is empty except for the lock holder, and the lock holder tries to release at exactly the same time some other thread is trying to add itself to the queue or (2) when a thread waiting to acquire the lock receives a spurious wake up.
Condition Variable
Given a working mutex implementation with
Birrell, Andrew D; Impliementing Condition Variables with Semaphores, in Computer Systems, Theory, Technology, and Applications, Andrew Herbert and Karen Spärck Jones, editors, pp. 29-37, Springer Monographs in Computer Science, 2004.
and in particular, Birrell's advice that the condition variable should do the queueing itself, rather than relying on some other primitive. Birrell's version additionally makes use of the observation that each thread can be waiting for at most one condition variable, so each qnode can be allocated as part of the thread control block when the thread is created. (Which seems a bad idea to me, because then you have a dangling pointer when a thread gets killed while waiting for a condition variable.)
Start with a non-atomic queue of integers data type with
This implementation does suffer from spurious wakeups, but that's standard for condition variable implementations. It is assumed that you are using condition variables to wait for some precondition (described by predicate
For the mutex I'm going to propose a modification of the MCS spinlock that uses your
sleep/wake primitives to block instead of spin while waiting to acquire the mutex. The MCS lock was introduced inMellor-Crummey, John M; Scott, Michael L: Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors, ACM Trans. on Comp. Sys. (TOCS), 9(1):21-65, 1991.
The MCS lock was specifically designed to eliminate the undesirable coherence traffic associated with spin locks in large-scale cache-coherent shared memory multiprocessors (which is not your concern) but I'm suggesting it because of the analogy between inter-processor coherence messaging in multiprocessors and your
sleep and wake primitives. Essentially we are keeping a linked-list of waiters for the lock. The lock itself is a pointer to the tail of the list. The head of the list is the thread that currently has the lock. The tricky part is that there are two pointers to the tail so you need to be careful about the lack of atomicity when adding or removing the tail node.
I'm going to use the same pseudo-code notation used in the paper, which is Pascal-based (it was 1991, give them a break), so the pointer notation may be unfamiliar to many readers.
I: ^qnode means, "the variable I has the type 'pointer to qnode'," and I^ means, "that which I points to."compare_and_swap(addr, old, new) is defined as (atomically) if addr^ != old return false; addr^ := new; return true. atomic_swap(addr, new) is defined as (atomically) old := addr^; addr^ := new; return old.type QNode = record
owner : Boolean
threadId : ThreadIdentifier
next : ^QNode
type lock = ^QNode // locks start initialized to "nil" (the null pointer)
// The qnode I needs to be allocated in an outer scope
procedure acquire_lock (L : ^lock, I : ^qnode)
I->threadId := myThreadId
I->next := nil
predecessor : ^qnode := atomic_swap(L, I)
if predecessor != nil // queue was non-empty
I->owner := false
predecessor->next := I // put myself on the tail
repeat sleep() while not I->owner
procedure release_lock (L : ^lock, I : ^qnode)
if I->next == nil // no known successor
if compare_and_swap(L, I, nil) return true // L agrees that there's no successor
else repeat (spin) while I->next == nil // else spin wait for race to clear itself
I->next->owner := true // there is a successor, wake it
wake(I->next->threadId)The only spinning is (1) in the (hopefully extremely rare) condition that the queue is empty except for the lock holder, and the lock holder tries to release at exactly the same time some other thread is trying to add itself to the queue or (2) when a thread waiting to acquire the lock receives a spurious wake up.
Condition Variable
Given a working mutex implementation with
lock(m) and unlock(m), it is possible to do a relatively naïve but working implementation of a condition variable. This design is based on Birrell, Andrew D; Impliementing Condition Variables with Semaphores, in Computer Systems, Theory, Technology, and Applications, Andrew Herbert and Karen Spärck Jones, editors, pp. 29-37, Springer Monographs in Computer Science, 2004.
and in particular, Birrell's advice that the condition variable should do the queueing itself, rather than relying on some other primitive. Birrell's version additionally makes use of the observation that each thread can be waiting for at most one condition variable, so each qnode can be allocated as part of the thread control block when the thread is created. (Which seems a bad idea to me, because then you have a dangling pointer when a thread gets killed while waiting for a condition variable.)
Start with a non-atomic queue of integers data type with
push_tail(q, t) and pop_head(q) and is_empty(q).type CVar = record
q : Queue
qm : Mutex // private mutex to protect the queue
// precondition: caller holds mutex m
procedure wait(cv : ^CVar, m : ^Mutex)
lock(cv->qm)
push_tail(cv->q, myThreadId) // atomically add caller to the queue
unlock(cv->qm)
unlock(m)
sleep()
lock(m)
procedure signal(cv : ^CVar)
lock(cv->qm)
if not is_empty(cv->q)
wake(pop_head(cv->q))
unlock(cv->qm)This implementation does suffer from spurious wakeups, but that's standard for condition variable implementations. It is assumed that you are using condition variables to wait for some precondition (described by predicate
P()) to become true and that you are thus using the condition variable like this:m : Mutex
cv : CVar
lock(m)
while not P()
wait(cv, m)
// at this point P() is true and m is acquiredCode Snippets
type QNode = record
owner : Boolean
threadId : ThreadIdentifier
next : ^QNode
type lock = ^QNode // locks start initialized to "nil" (the null pointer)
// The qnode I needs to be allocated in an outer scope
procedure acquire_lock (L : ^lock, I : ^qnode)
I->threadId := myThreadId
I->next := nil
predecessor : ^qnode := atomic_swap(L, I)
if predecessor != nil // queue was non-empty
I->owner := false
predecessor->next := I // put myself on the tail
repeat sleep() while not I->owner
procedure release_lock (L : ^lock, I : ^qnode)
if I->next == nil // no known successor
if compare_and_swap(L, I, nil) return true // L agrees that there's no successor
else repeat (spin) while I->next == nil // else spin wait for race to clear itself
I->next->owner := true // there is a successor, wake it
wake(I->next->threadId)type CVar = record
q : Queue
qm : Mutex // private mutex to protect the queue
// precondition: caller holds mutex m
procedure wait(cv : ^CVar, m : ^Mutex)
lock(cv->qm)
push_tail(cv->q, myThreadId) // atomically add caller to the queue
unlock(cv->qm)
unlock(m)
sleep()
lock(m)
procedure signal(cv : ^CVar)
lock(cv->qm)
if not is_empty(cv->q)
wake(pop_head(cv->q))
unlock(cv->qm)m : Mutex
cv : CVar
lock(m)
while not P()
wait(cv, m)
// at this point P() is true and m is acquiredContext
StackExchange Computer Science Q#38274, answer score: 3
Revisions (0)
No revisions yet.