i always got the sense that spinlocks were about maximum portability and reliability in the face of unreliable event driven approaches. the dumb inefficient thing that makes the heads of the inexperienced explode, but actually just works and makes the world go 'round.
The basic rule of writing your own cross-thread datastructures like mutexes or condition variables is... don't, unless you have very good reason not to. If you're in that rare circumstance where you know the library you're using isn't viable for some reason, then the next best rule is to use your OS's version of a futex as the atomic primitive, since it's going to solve most of the pitfalls for you automatically.
The only time I've manually written my own spin lock was when I had to coordinate between two different threads, one of which was running 16-bit code, so using any library was out of the question, and even relying on syscalls was sketchy because making sure the 16-bit code is in the right state to call a syscall itself is tricky. Although in this case, since I didn't need to care about things like fairness (only two threads are involved), the spinlock core ended up being simple:
As always: use standard libraries first, profile, then write your own if the data indicate that it's necessary. To your point, the standard library probably already uses the OS primitives under the hood, which themselves do a short userspace spin-wait and then fall back to a kernel wait queue on contention. If low latency is a priority, the latter might be unacceptable.
The following is an interesting talk where the author used a custom spinlock to significantly speed up a real-time physics solver.
Another time when writing a quick and dirty spinlock is reasonable is inside a logging library. A logging library would normally use a full-featured mutex, but what if we want the mutex implementation to be able to log? Say the mutex can log that it is non recursive yet the same thread is acquiring it twice; or that it has detected a deadlock. The solution is to introduce a special subset of the logging library to use a spinlock.
"Unfair" paragraph is way too short. This is the main problem! The outlier starvation you get from contended spinlocks is extraordinary and, hypothetically, unbounded.
OS kernel runqueue is using a spinlock to schedule everything. So it works. Should you ever use a spinlock in application code? No. Let the OS via the synchronization primitives in whatever language your app is in.
You can limit yourself to the performance of a 1mhz 6502 with no OS if you don't like it. Even MSDos on a 8086 with 640K ram allows for things that require complexity of this type (not spin locks, but the tricks needed to make "terminate stay resident" work are evil in a similar way)
Modern CPUs (since around 2000) go faster in large part because they have multiple cores that can do more than one thing in a time. If your program needs to go faster using more cores is often your best answer and then you will need these tricks. (SIMD or the GPU are also common answers that might or might not be better for your problem)
Modern CPUs can do 4-5 GHz singled threaded. (Sometimes you can even get a higher clock speed by disabling other cores.) This somewhat outpaces "a 1mhz 6502" even without parallelization.
They can, but nobody runs a single process on such CPUs. They run some form of OS which implements spinlock, mutexes, and all these other complex things.
I suppose someplace someone is running an embedded system without an OS on such a processor - but I'd expect they are still using extra cores and so have all of the above tricks someplace.
Not really. If the solution has less complexity than is inherent in the problem, it can't possibly work. If the solution has complexity equal to or greater than the complexity inherent in the problem, it may work. So if you see complex code handling many different edge cases, you can take that as an indicator the author understood the problem. That doesn't mean they do understand or that the solution does work; only that you have more confidence than you did initially.
i always got the sense that spinlocks were about maximum portability and reliability in the face of unreliable event driven approaches. the dumb inefficient thing that makes the heads of the inexperienced explode, but actually just works and makes the world go 'round.
The basic rule of writing your own cross-thread datastructures like mutexes or condition variables is... don't, unless you have very good reason not to. If you're in that rare circumstance where you know the library you're using isn't viable for some reason, then the next best rule is to use your OS's version of a futex as the atomic primitive, since it's going to solve most of the pitfalls for you automatically.
The only time I've manually written my own spin lock was when I had to coordinate between two different threads, one of which was running 16-bit code, so using any library was out of the question, and even relying on syscalls was sketchy because making sure the 16-bit code is in the right state to call a syscall itself is tricky. Although in this case, since I didn't need to care about things like fairness (only two threads are involved), the spinlock core ended up being simple:
As always: use standard libraries first, profile, then write your own if the data indicate that it's necessary. To your point, the standard library probably already uses the OS primitives under the hood, which themselves do a short userspace spin-wait and then fall back to a kernel wait queue on contention. If low latency is a priority, the latter might be unacceptable.
The following is an interesting talk where the author used a custom spinlock to significantly speed up a real-time physics solver.
Dennis Gustafsson – Parallelizing the physics solver – BSC 2025 https://www.youtube.com/watch?v=Kvsvd67XUKw
Another time when writing a quick and dirty spinlock is reasonable is inside a logging library. A logging library would normally use a full-featured mutex, but what if we want the mutex implementation to be able to log? Say the mutex can log that it is non recursive yet the same thread is acquiring it twice; or that it has detected a deadlock. The solution is to introduce a special subset of the logging library to use a spinlock.
I'm not sure how a spinlock solves this problem. Wouldn't that just cause the process to hang busy?
Only until the other thread leaves the logger
"Unfair" paragraph is way too short. This is the main problem! The outlier starvation you get from contended spinlocks is extraordinary and, hypothetically, unbounded.
Sheesh. Can something this complicated ever truly be said to work?
OS kernel runqueue is using a spinlock to schedule everything. So it works. Should you ever use a spinlock in application code? No. Let the OS via the synchronization primitives in whatever language your app is in.
Yes, if you're careful. Actually careful, not pretend careful. Which is pretty normal in C and C++.
You can limit yourself to the performance of a 1mhz 6502 with no OS if you don't like it. Even MSDos on a 8086 with 640K ram allows for things that require complexity of this type (not spin locks, but the tricks needed to make "terminate stay resident" work are evil in a similar way)
I don't think that's fair. You can go fast, just not more than one task at a time.
Modern CPUs (since around 2000) go faster in large part because they have multiple cores that can do more than one thing in a time. If your program needs to go faster using more cores is often your best answer and then you will need these tricks. (SIMD or the GPU are also common answers that might or might not be better for your problem)
Modern CPUs can do 4-5 GHz singled threaded. (Sometimes you can even get a higher clock speed by disabling other cores.) This somewhat outpaces "a 1mhz 6502" even without parallelization.
They can, but nobody runs a single process on such CPUs. They run some form of OS which implements spinlock, mutexes, and all these other complex things.
I suppose someplace someone is running an embedded system without an OS on such a processor - but I'd expect they are still using extra cores and so have all of the above tricks someplace.
Isn't it the opposite? The complication is evidence of function. The simple code doesn't work.
That assertion feels suspiciously like a logical fallacy.
Not really. If the solution has less complexity than is inherent in the problem, it can't possibly work. If the solution has complexity equal to or greater than the complexity inherent in the problem, it may work. So if you see complex code handling many different edge cases, you can take that as an indicator the author understood the problem. That doesn't mean they do understand or that the solution does work; only that you have more confidence than you did initially.
It's a weak signal but the reasoning is sound.
Everything should be made as simple as possible, but not simpler.
Code has a minimum complexity to solve the problem
Great article! Thanks for posting this.