-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add host-bulk insert_or_apply
using shared_memory
#551
Add host-bulk insert_or_apply
using shared_memory
#551
Conversation
/ok to test |
/ok to test |
1 similar comment
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round. Nice work!
Does the unit test cover each of the new code paths, i.e., shmem vs gmem kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some final nits. @srinivasyadav18 Can you please share the insert or apply benchmark results before and after this PR in the PR discussion?
Excellent work! Thank you!
Benchmarks : Cmp time = global memory implementation [before]
|
/ok to test |
/ok to test |
I think there are issues with rebasing. I need to resolve it and push the changes again. |
86c92b6
to
b9a6c27
Compare
/ok to test |
As CI tests for GCC 12 fail, I have added a workaround (in e804b4c) to pass the error, by using a pre-constructed value to be used as size of shared_map. |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
use shared_memory kernel only if `cg_size == 1`. use `shmem_block_size` when calculating `shmem_grid_size`.
/ok to test |
/ok to test |
This PR cleans up some of the issues occured during merge of #551. 1. propagate the **key_eq** and **probing_scheme** from **global** `ref` to constructor of `shared_memory_ref` in **insert_or_apply_shmem** kernel. 2. Disable **init** overload of `insert_or_apply` using **sfinae**, because `cuda::stream_ref` is default constructed, this can invoke the **init** overload even though the user calls **no-init** overload. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This PR add's host-bulk
insert_or_apply
using shared_memory which could improve performance in low cardinality and very high mulitiplicty case.