patternMinor

Why do OS locks require hardware support?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

threads cs stackoverflow operating-systems

whyhardwarerequirelockssupport

Problem

I just read the chapter on locks in the book Operating Systems: Three Easy Pieces, and it discusses an implementation of locks (see in Figure 28.9) that involves a test-and-set hardware instruction.

The textbook emphasizes that both hardware support and OS support are necessary to build an efficient locking mechanism. Wikipedia echoes this sentiment:

Locks typically require hardware support for efficient implementation. This support usually takes the form of one or more atomic instructions such as "test-and-set", "fetch-and-add" or "compare-and-swap". These instructions allow a single process to test if the lock is free, and if free, acquire the lock in a single atomic operation.

I think I've come up with an alternative lock implementation that doesn't require hardware support. It seems to have relatively similar performance overhead, while guaranteeing mutual exclusion, so making this post to find out if I'm missing something.
Textbook Lock Implementation

I'll first summarize the textbook implementation. For setup, we need to:

Maintain a single flag (set to 0 or 1) and a queue (of thread IDs).

Maintain a separate flag for a "guard" lock, which will be a simple spin lock. Because the guard lock is meant to protect a relatively small critical section, the performance overhead of a spin lock won't be as pronounced.

When a thread wants to acquire a lock, it will do the following:

First, spin with the test-and-set instruction until you acquire the guard lock.

Once acquired, check if the flag (of the main lock) is set to 0. If it is, simply set it to 1, and the lock has been acquired! If the flag is set to 1, proceed to step 3.

Push the current thread ID onto the queue.

"Park" the current thread. This is an instruction in Solaris that basically puts a thread to sleep.

On the other hand, when a thread wants to release a lock, it will do the following:

Spin until you acquire the guard lock.

If the queue is empty, set the flag to 0, and you're

Solution

Since the OS manages scheduling, we guarantee mutual exclusion for the following steps.

But how do you guarantee mutual exclusion inside the OS?

On a single-processor machine¹, you know that only the kernel code that serves the lock system call is executing. All other threads are in a known state. However the kernel code can be interrupted at any time by a hardware interrupt. This could be the timer interrupt that tells the kernel that it's supposed to schedule a different thread. You can implement locks just by manipulating the bits in the lock table, but only if interrupt code won't manipulate the lock table. So it's possible to implement a lock this way, but it imposes some design constraints in the kernel.

On a multi-processor machine, all you know is that this processor is serving the lock system call. The other processors could be manipulating the lock table at the same time. At step 3, you read an entry from the lock table and then possibly write to that entry. Another processor could have read or written the entry between your read and your write². Avoiding this race condition is precisely why processors have instructions such as compare-and-swap that combine a read and a write.

Even on a single-processor machine, making a system call is a heavyweight operation compared to a single instruction such as compare-and-swap. It's no good for high-speed concurrency.

¹ By single-processor, I mean a machine that executes a single thread at a time. In this context, a machine with a single piece of silicon that contains multiple logical cores is a multi-processor machine.

² And in the real world, it can get worse, because processors don't always have a consistent view of memory, so two processors could for example take the same lock independently even if their read/write sequences don't overlap. But even with consistent memory, interlacing read/write sequences spells doom for a naive lock implementation.

Context

StackExchange Computer Science Q#132115, answer score: 5

Revisions (0)

No revisions yet.