patterncppMinor

Spinlock for C++ kernel (with x86 ASM)

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview cpp stackoverflow c++c++11 assembly locking

spinlockwithx86asmforkernel

Problem

This is some prototype code for a spin-lock for my toy operating system. Bear in mind this means there are no standard library routines to fall back on. It's also for C++11 compilation and will only ever be compiled by the GNU G++ compiler.

The primary use for this is in the scheduler and other code that accesses the process table.

I think there are lots of ways of getting this wrong, and some of them aren't obvious. So it'd be great to have a second pair of eyes on it.

Firstly the locking code itself in NASM x86 asm:

; *******************************
; void spin_lock(uint32 * p);
;
global __kspin_lock
__kspin_lock:
    push ebp
    mov ebp, esp        ; we might as well the the stack frame right?

    mov eax, [ebp + 8]  ; eax contains address of uint32 used for the lock
    mov ecx, 1

.tryacquire             ; try and get the lock.
    xchg ecx, [eax]
    test ecx, ecx
    jz .acquired        ; if the lock wasn't 0 (unlocked) repeat.

.pauseloop
    pause               ; see 'About PAUSE' below
    test dword [eax], 1
    jne .pauseloop
    jmp .tryacquire

.acquired
    pop ebp
    ret

And secondly a class that wraps it a little. (I will eventually then wrap this with an RAII class to release the lock automatically when it goes out of scope).

```
#pragma once
#include

extern "C" {
/** These routines probably shouldn't be used directly. */
void __kspin_lock(uint32 *p);
};

/**
* SpinLock provides a crude locking mechanism based around an atomic
* exchange of a variable in memory.
*
* The SpinLock loops, swapping the value 1 for the value in lock.
* Each time it checks to see if the value retrieved from lock is now 0,
* if it is it knows that lock now contains 1, and it holds the lock.
*
* To ensure all spinlocks reside in a unique cache line the spinlock
* is aligned to 64 bytes and is 64 bytes in size. This avoids two
* processes competing for ownership of the same 64 byte cache line,
* which could be bad for performance.

Solution

This is largely a duplicate of https://stackoverflow.com/questions/6935442/x86-spinlock-using-cmpxchg, https://stackoverflow.com/questions/11959374/fastest-inline-assembly-spinlock, etc.

(1) Your code works only on 32-bit x86. If you're using 64-bit (x86-64), the calling convention is totally wrong (you need to save and restore %rbp rather than %ebp, and you're looking for the argument in the wrong place). If you're implementing your own OS, I suppose you know what you're doing... but in general I think it's weird to see 32-bit x86 code these days.

(2) Useless trivia of the day: The two instructions test ecx, ecx; jz .acquired could be replaced with the single instruction jecxz .acquired. I don't know whether this is an optimization or a pessimization on your particular hardware. It would be a size optimization, at least.

(2.5) Really, consider using cmpxchg instead of xchg; test. The cmpxchg mnemonic maps more directly onto the "Compare And Swap" idiom that you're using. It will also help you when it comes time to implement "is the lock held by this thread"; you can set the word to current-thread-id instead of just 1.

(3) See 'About PAUSE' below: Unfortunately there is no "About PAUSE" below. However, it seems like you're doing the right thing there. You could definitely tighten up the code; for example I don't know why you're wasting time with that test dword [eax], 1. You should just retry the xchg as soon as the pause is over.

(4) Consider removing the stack frame; it's not doing anything useful (unless it's necessary for your debugger of choice), just slowing things down.

Context

StackExchange Code Review Q#84148, answer score: 5

Revisions (0)

No revisions yet.