-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ck_ec: event count with OS-assisted blocking #133
Conversation
5a269a7
to
1352f8a
Compare
pkhuong@2delilah ~/ck/r/c/benchmark ♪ [ pwd && make check ] ± ck_ec@10ce30… ⌚ 18:20:05 MP ec32 SP ec64 MP ec64 |
ab6db1e
to
abb9030
Compare
22ab421
to
8bdc2ea
Compare
Packed the predicate's argument list in a struct for more ABI stability. |
90d1cf5
to
62d393d
Compare
@pkhuong This is great work! Look forward to merging it in. I see a few things building:
Less important, but putting on radar just in case it matters:
|
Fixed the type mismatch. Tried to get inlining smarter wrt dispatching, but I think we want to let the compiler keep ck_ec_add out of line when it makes sense. |
|
@pkhuong - Any other follow-up commits or should I start process of merging down? |
ck_ec implements 32- and (on 64 bit platforms) 64- bit event counts. Event counts let us easily integrate OS-level blocking (e.g., futexes) in lock-free protocols. Waking up waiters only locks in the OS kernel, and does not happen at all when no waiter is blocked. Waiters only block conditionally, if the event count's value is still equal to some prior value. ck_ec supports multiple producers (wakers) and consumers (waiters), and, on x86-TSO, has a more efficient specialisation for single producer mode. In the latter mode, the overhead compared to a version counter is on the order of 2-3 cycles and 1-2 instructions, in the fast path. The slow path, when there are threads blocked on the event count, consists of one additional atomic instruction and a futex syscall. Similarly, the fast path for consumers, when an update comes quickly, has no overhead compared to spinning on a read-only counter. After a few thousand cycles, consumers (waiters) enter the slow path with one atomic instruction and a few blocking syscalls. The single-producer specialisation requires the x86-TSO memory model, x86's non-atomic read-modify-write instructions, and, ideally a futex-like OS abstraction. On !x86/x86_64 platforms, single producer increments fall back to the multiple producer code path. Fixes #79
Merging in, I expect additional iteration in master, treating as experimental. Thanks Paul! |
I finally convinced myself that this thing works, and will always do so reliably.
It's x86-TSO only, so I had to introduce some weirdness in the makefiles.
TODO: update docs for new vtableTODO: test the timeval internal utilsTODO: make the regression tests work on x86TODONT: final style check
TODONT: man page
TODO: benchmarksTODO: fix build on !linux x86oids, done, except for validate.To discuss: how do we feel about the type generic macros.
I'm gonna leave the man pages for later ;)