From a quick-ish glance at RingBuffer, it looks like head and tail could possibly be falsely shared.
However, I tried to improve it by padding the atomics so they'd be on different cache lines, wrote a threaded benchmark, and failed to produce something faster than the current implementation (in fact, my version was usually slightly slower in benchmarks).
So, I guess this issue is either:
- How is the current implementation getting around false sharing?
- Or, if you haven't explicitly thought about false sharing before, perhaps look into it and see if you can make
ringbuf even faster :)
From a quick-ish glance at
RingBuffer, it looks likeheadandtailcould possibly be falsely shared.However, I tried to improve it by padding the atomics so they'd be on different cache lines, wrote a threaded benchmark, and failed to produce something faster than the current implementation (in fact, my version was usually slightly slower in benchmarks).
So, I guess this issue is either:
ringbufeven faster :)