Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add init element overload to insert_or_apply #555

Merged
merged 24 commits into from
Aug 7, 2024

Conversation

srinivasyadav18
Copy link
Contributor

This PR adds a new overload to insert_or_apply which takes init parameter that represents the identity element of the binary operator op used for apply operation. This allows to do optimization by skipping wait_for_payload in case sentienl value of the map equals to the init.

Copy link

copy-pr-bot bot commented Jul 22, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@srinivasyadav18 srinivasyadav18 changed the title Add init element overload to insert_or_apply Add init element overload to insert_or_apply Jul 22, 2024
@PointKernel PointKernel added type: feature request New feature request topic: static_map Issue related to the static_map Needs Review Awaiting reviews before merging labels Jul 22, 2024
@sleeepyjack
Copy link
Collaborator

/ok to test

include/cuco/detail/static_map/kernels.cuh Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
@@ -594,6 +713,8 @@ class operator_impl<
auto probing_iter = probing_scheme(key, storage_ref.window_extent());
auto const empty_value = ref_.impl_.empty_slot_sentinel().second;

auto constexpr use_insert = PayloadWrite and (sizeof(value_type) > 8);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sizeof(value_type) > 8 just checks if we use packed_cas or not, right? This value might change based on the architecture (on sm_90 we have 16B CAS). It's fine for now but we have to find a better solution to the problem, e.g., based on a trait system or making it queryable from the slot_inserter (see comment).

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srinivasyadav18 can you please add missing docs for the new functions added?

include/cuco/detail/static_map/kernels.cuh Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/reduction_functors.cuh Outdated Show resolved Hide resolved
@@ -76,33 +76,55 @@ CUCO_KERNEL __launch_bounds__(BlockSize) void insert_or_assign(InputIt first,
*
* @note Callable object to perform binary operation should be able to invoke as
* Op(cuda::atomic_ref<T,Scope>, T>)
* @note If `HasInit` is `true` and if `init` value equals to the `sentinel value`, we directly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @note If `HasInit` is `true` and if `init` value equals to the `sentinel value`, we directly
* @note If `HasInit` is `true` and `init == empty_value_sentinel`, we directly

@@ -76,33 +76,55 @@ CUCO_KERNEL __launch_bounds__(BlockSize) void insert_or_assign(InputIt first,
*
* @note Callable object to perform binary operation should be able to invoke as
* Op(cuda::atomic_ref<T,Scope>, T>)
* @note If `HasInit` is `true` and if `init` value equals to the `sentinel value`, we directly
* `apply` the `op` instead of atomic store and then waiting for the payload to get materalized.
* This has potential speedups when sizeof(value_type) > 8.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only relevant part here the size of the value or is it the size of the pair, i.e., when using packed CAS vs. CAS + dependent write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its size of value_type of the map, because that will determine on which type we are performing CAS operation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The exact size value might change in the future as we're about to get 128b CAS on Hopper+ exposed through libcu++. That means we can use the packed CAS approach with up-to 16B value types

include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
@srinivasyadav18
Copy link
Contributor Author

Benchmarks on H100

We can see performance benifits on I64 types from 5-40% by using init as empty_value_sentinel
Ref = "init == empty_value_sentinel"
Cmp = "does not use init"

['./init.json', './no_init.json']
# static_map_insert_or_apply_uniform_multiplicity

## [0] NVIDIA H100 PCIe

|  Key  |  Value  |  Distribution  |  Cardinality  |  NumInputs  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|-------|---------|----------------|---------------|-------------|------------|-------------|------------|-------------|------------|---------|----------|
|  I32  |   I32   |    UNIFORM     |       1       |      1      |  36.710 us |      88.57% |  36.340 us |       5.64% |  -0.370 us |  -1.01% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     16      |  37.724 us |       7.64% |  38.072 us |       9.79% |   0.348 us |   0.92% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |     16      |  36.621 us |       3.59% |  36.478 us |       8.26% |  -0.143 us |  -0.39% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     64      |  37.892 us |       8.04% |  39.567 us |      11.94% |   1.675 us |   4.42% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |     64      |  37.073 us |       8.79% |  39.639 us |      11.76% |   2.565 us |   6.92% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |     64      |  44.607 us |       3.69% |  44.279 us |       2.09% |  -0.328 us |  -0.73% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     128     |  38.846 us |       9.40% |  40.940 us |      11.62% |   2.094 us |   5.39% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |     128     |  39.020 us |      10.26% |  40.926 us |       9.48% |   1.906 us |   4.88% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |     128     |  37.870 us |       8.23% |  39.916 us |      10.78% |   2.046 us |   5.40% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |     128     |  44.537 us |       3.76% |  44.070 us |       6.49% |  -0.468 us |  -1.05% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     256     |  38.643 us |      11.72% |  39.841 us |      10.39% |   1.198 us |   3.10% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |     256     |  39.964 us |      10.10% |  40.852 us |       9.63% |   0.888 us |   2.22% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |     256     |  37.606 us |       8.01% |  38.647 us |       9.75% |   1.041 us |   2.77% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |     256     |  43.289 us |       9.49% |  43.435 us |       5.52% |   0.146 us |   0.34% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |     256     |  44.383 us |       3.01% |  44.112 us |       2.41% |  -0.271 us |  -0.61% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     512     |  43.940 us |       5.05% |  43.915 us |       3.80% |  -0.025 us |  -0.06% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |     512     |  44.528 us |       3.14% |  44.163 us |       2.62% |  -0.365 us |  -0.82% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |     512     |  44.505 us |       8.83% |  44.018 us |       3.68% |  -0.487 us |  -1.09% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |     512     |  44.615 us |       2.47% |  44.258 us |       2.35% |  -0.357 us |  -0.80% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |     512     |  44.394 us |       3.64% |  44.166 us |       3.31% |  -0.228 us |  -0.51% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |     512     |  44.619 us |       2.10% |  44.230 us |       2.17% |  -0.389 us |  -0.87% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |    1000     |  45.317 us |       3.50% |  44.853 us |       2.43% |  -0.464 us |  -1.02% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |    1000     |  44.885 us |       4.89% |  44.719 us |       2.58% |  -0.166 us |  -0.37% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |    1000     |  45.376 us |       7.20% |  44.842 us |       3.55% |  -0.534 us |  -1.18% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |    1000     |  44.910 us |       4.63% |  44.774 us |       3.04% |  -0.136 us |  -0.30% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |    1000     |  45.209 us |       3.37% |  44.815 us |       2.56% |  -0.395 us |  -0.87% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |    1000     |  45.258 us |       2.18% |  44.813 us |       2.86% |  -0.446 us |  -0.98% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |    1000     |  45.239 us |       3.91% |  44.752 us |       4.53% |  -0.487 us |  -1.08% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |    10000    |  53.361 us |       3.17% |  52.216 us |       1.49% |  -1.146 us |  -2.15% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |    10000    |  45.023 us |       2.27% |  44.107 us |       2.37% |  -0.916 us |  -2.03% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |    10000    |  45.018 us |       4.18% |  44.224 us |       2.54% |  -0.794 us |  -1.76% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |    10000    |  44.999 us |       2.81% |  44.201 us |       3.51% |  -0.797 us |  -1.77% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |    10000    |  45.041 us |       2.21% |  44.345 us |       2.24% |  -0.696 us |  -1.55% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |    10000    |  44.851 us |       2.12% |  44.305 us |       3.65% |  -0.546 us |  -1.22% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |    10000    |  45.042 us |       2.54% |  44.151 us |       2.11% |  -0.891 us |  -1.98% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |    10000    |  44.880 us |       2.94% |  44.130 us |       2.72% |  -0.750 us |  -1.67% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |   100000    | 276.805 us |       1.69% | 274.402 us |       1.28% |  -2.403 us |  -0.87% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |   100000    | 191.470 us |       1.35% | 192.043 us |       2.01% |   0.573 us |   0.30% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |   100000    | 182.886 us |       1.35% | 183.371 us |       1.44% |   0.484 us |   0.26% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |   100000    | 183.343 us |       1.45% | 183.198 us |       1.51% |  -0.144 us |  -0.08% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |   100000    | 183.155 us |       1.34% | 183.598 us |       1.49% |   0.442 us |   0.24% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |   100000    | 183.128 us |       1.38% | 183.636 us |       4.08% |   0.508 us |   0.28% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |   100000    | 183.258 us |       1.71% | 183.651 us |       1.60% |   0.393 us |   0.21% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |   100000    | 183.280 us |       1.43% | 183.382 us |       1.51% |   0.103 us |   0.06% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |   100000    | 183.108 us |       1.33% | 183.510 us |       1.46% |   0.402 us |   0.22% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |   1000000   |   1.155 ms |       1.22% |   1.156 ms |       1.32% |   0.966 us |   0.08% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |   1000000   | 470.833 us |       0.99% | 471.863 us |       0.92% |   1.030 us |   0.22% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |   1000000   | 451.697 us |       0.99% | 452.087 us |       0.92% |   0.389 us |   0.09% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |   1000000   | 451.866 us |       0.94% | 451.893 us |       0.92% |   0.027 us |   0.01% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |   1000000   | 452.373 us |       0.94% | 452.317 us |       1.38% |  -0.056 us |  -0.01% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |   1000000   | 451.839 us |       1.21% | 452.245 us |       0.85% |   0.406 us |   0.09% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |   1000000   | 451.933 us |       0.95% | 452.511 us |       1.05% |   0.578 us |   0.13% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |   1000000   | 452.320 us |       1.02% | 452.794 us |       1.61% |   0.475 us |   0.10% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |   1000000   | 452.580 us |       1.12% | 453.230 us |       0.88% |   0.650 us |   0.14% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |    1000000    |   1000000   | 454.180 us |       0.92% | 453.906 us |       1.29% |  -0.274 us |  -0.06% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |  10000000   |   8.733 ms |       0.38% |   8.680 ms |       0.28% | -53.219 us |  -0.61% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |  10000000   |   1.696 ms |      10.17% |   1.675 ms |      10.30% | -21.053 us |  -1.24% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |  10000000   |   1.138 ms |       6.41% |   1.179 ms |       0.85% |  40.241 us |   3.53% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |  10000000   |   1.078 ms |       0.94% |   1.062 ms |       1.29% | -15.770 us |  -1.46% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |  10000000   | 953.347 us |       0.60% | 930.578 us |       0.51% | -22.769 us |  -2.39% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |  10000000   | 918.620 us |       0.82% | 895.189 us |       0.91% | -23.431 us |  -2.55% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |  10000000   | 932.159 us |       2.02% | 883.467 us |       0.97% | -48.692 us |  -5.22% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |  10000000   | 929.249 us |       0.95% | 905.898 us |       1.05% | -23.351 us |  -2.51% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |  10000000   | 968.087 us |       0.60% | 944.206 us |       0.61% | -23.881 us |  -2.47% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    1000000    |  10000000   |   1.529 ms |       0.44% |   1.510 ms |       0.47% | -19.677 us |  -1.29% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |   10000000    |  10000000   |   1.875 ms |       0.29% |   1.855 ms |       0.30% | -19.858 us |  -1.06% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |  100000000  |  83.021 ms |       0.18% |  82.989 ms |       0.22% | -32.184 us |  -0.04% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      16       |  100000000  |  16.747 ms |      13.25% |  16.729 ms |      13.32% | -17.970 us |  -0.11% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      64       |  100000000  |   9.142 ms |       5.88% |   9.130 ms |       5.87% | -11.743 us |  -0.13% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |  100000000  |   7.528 ms |       1.82% |   7.531 ms |       1.87% |   3.055 us |   0.04% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |  100000000  |   7.292 ms |      11.03% |   7.291 ms |      10.97% |  -1.658 us |  -0.02% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |  100000000  |   6.487 ms |       1.32% |   6.483 ms |       1.26% |  -4.825 us |  -0.07% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |  100000000  |   6.635 ms |       0.17% |   6.630 ms |       0.13% |  -5.048 us |  -0.08% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |  100000000  |   7.691 ms |       0.12% |   7.694 ms |       0.07% |   3.497 us |   0.05% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |  100000000  |   7.879 ms |       0.14% |   7.881 ms |       0.08% |   1.772 us |   0.02% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |    1000000    |  100000000  |  14.761 ms |       0.62% |  14.817 ms |       0.52% |  56.383 us |   0.38% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |   10000000    |  100000000  |  18.917 ms |       0.48% |  18.927 ms |       0.53% |  10.378 us |   0.05% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |   100000000   |  100000000  |  19.476 ms |       0.54% |  19.487 ms |       0.33% |  10.516 us |   0.05% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |      1      |  39.152 us |       9.45% |  42.263 us |       8.69% |   3.112 us |   7.95% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     16      |  40.782 us |      10.08% |  40.757 us |       9.99% |  -0.026 us |  -0.06% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |     16      |  39.299 us |       9.48% |  40.437 us |       9.95% |   1.138 us |   2.90% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     64      |  38.587 us |       9.12% |  38.560 us |       9.28% |  -0.027 us |  -0.07% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |     64      |  40.075 us |      10.74% |  40.658 us |       9.82% |   0.583 us |   1.45% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |     64      |  44.485 us |       2.06% |  44.190 us |       2.50% |  -0.295 us |  -0.66% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     128     |  44.738 us |       3.53% |  44.268 us |       9.19% |  -0.470 us |  -1.05% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |     128     |  44.685 us |       3.57% |  44.251 us |       2.63% |  -0.434 us |  -0.97% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |     128     |  44.479 us |       3.06% |  44.115 us |       2.53% |  -0.364 us |  -0.82% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |     128     |  44.383 us |       2.76% |  44.044 us |       2.08% |  -0.339 us |  -0.76% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     256     |  44.535 us |       2.58% |  44.127 us |       2.63% |  -0.408 us |  -0.92% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |     256     |  44.675 us |       3.24% |  44.062 us |       2.60% |  -0.613 us |  -1.37% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |     256     |  44.342 us |       2.11% |  43.988 us |       2.14% |  -0.354 us |  -0.80% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |     256     |  44.446 us |       2.13% |  43.927 us |       2.15% |  -0.518 us |  -1.17% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |     256     |  44.202 us |       2.43% |  44.146 us |       2.14% |  -0.056 us |  -0.13% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     512     |  44.618 us |       3.23% |  44.591 us |       2.19% |  -0.028 us |  -0.06% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |     512     |  44.516 us |       3.35% |  44.551 us |       1.91% |   0.035 us |   0.08% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |     512     |  44.586 us |       2.33% |  44.644 us |       2.50% |   0.058 us |   0.13% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |     512     |  44.342 us |       3.76% |  44.687 us |       2.21% |   0.346 us |   0.78% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |     512     |  44.614 us |       3.93% |  44.559 us |       2.46% |  -0.055 us |  -0.12% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |     512     |  44.336 us |       2.74% |  44.386 us |       2.23% |   0.050 us |   0.11% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |    1000     |  44.726 us |       3.72% |  44.878 us |       4.08% |   0.151 us |   0.34% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |    1000     |  44.050 us |       5.96% |  44.045 us |       5.24% |  -0.005 us |  -0.01% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |    1000     |  44.895 us |       3.04% |  44.666 us |       3.39% |  -0.229 us |  -0.51% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |    1000     |  43.952 us |       6.80% |  44.446 us |       2.33% |   0.494 us |   1.12% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |    1000     |  44.879 us |       2.81% |  44.572 us |       2.22% |  -0.307 us |  -0.68% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |    1000     |  44.919 us |       3.47% |  44.568 us |       2.31% |  -0.351 us |  -0.78% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |    1000     |  44.624 us |       2.21% |  44.426 us |       1.92% |  -0.198 us |  -0.44% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |    10000    |  52.674 us |       1.72% |  52.393 us |       1.96% |  -0.281 us |  -0.53% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |    10000    |  44.796 us |       3.11% |  44.350 us |       2.18% |  -0.446 us |  -0.99% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |    10000    |  44.885 us |       3.14% |  44.528 us |       2.11% |  -0.357 us |  -0.80% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |    10000    |  44.838 us |       2.34% |  44.565 us |       3.02% |  -0.274 us |  -0.61% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |    10000    |  44.808 us |       2.06% |  44.449 us |       2.16% |  -0.359 us |  -0.80% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |    10000    |  44.727 us |       2.23% |  44.446 us |       2.26% |  -0.281 us |  -0.63% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |    10000    |  44.744 us |       2.16% |  44.420 us |       3.96% |  -0.324 us |  -0.72% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |    10000    |  52.686 us |       2.38% |  52.899 us |       3.38% |   0.213 us |   0.40% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |   100000    | 265.476 us |       1.63% | 272.265 us |       2.39% |   6.789 us |   2.56% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |   100000    | 191.573 us |       1.34% | 200.060 us |       1.49% |   8.486 us |   4.43% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |   100000    | 190.124 us |       1.79% | 192.812 us |       1.45% |   2.689 us |   1.41% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |   100000    | 183.660 us |       1.45% | 191.625 us |       1.53% |   7.965 us |   4.34% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |   100000    | 183.716 us |       1.64% | 184.318 us |       2.02% |   0.603 us |   0.33% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |   100000    | 183.759 us |       1.35% | 184.458 us |       1.60% |   0.699 us |   0.38% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |   100000    | 183.923 us |       1.75% | 184.377 us |       2.90% |   0.453 us |   0.25% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |   100000    | 183.696 us |       1.52% | 184.132 us |       1.44% |   0.436 us |   0.24% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |   100000    | 192.014 us |       1.59% | 191.918 us |       1.31% |  -0.096 us |  -0.05% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |   1000000   |   1.212 ms |       2.03% |   1.314 ms |       2.21% | 102.428 us |   8.45% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |   1000000   | 509.184 us |       0.85% | 599.358 us |       2.21% |  90.174 us |  17.71% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |   1000000   | 483.268 us |       1.11% | 502.615 us |       1.15% |  19.346 us |   4.00% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |   1000000   | 482.859 us |       0.82% | 504.573 us |       1.17% |  21.714 us |   4.50% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |   1000000   | 482.616 us |       0.85% | 502.739 us |       1.25% |  20.124 us |   4.17% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |   1000000   | 483.642 us |       1.53% | 502.331 us |       1.19% |  18.689 us |   3.86% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |   1000000   | 483.238 us |       0.91% | 501.865 us |       1.06% |  18.627 us |   3.85% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |   1000000   | 483.508 us |       0.90% | 504.326 us |       1.09% |  20.818 us |   4.31% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |   1000000   | 481.536 us |       0.89% | 500.053 us |       1.09% |  18.517 us |   3.85% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    1000000    |   1000000   | 483.663 us |       0.88% | 506.469 us |       1.11% |  22.806 us |   4.72% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |  10000000   |   9.121 ms |       0.32% |   9.697 ms |       0.26% | 575.694 us |   6.31% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |  10000000   |   2.334 ms |      11.43% |   3.091 ms |       9.83% | 756.897 us |  32.42% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |  10000000   |   1.467 ms |       1.45% |   1.852 ms |       4.76% | 384.448 us |  26.20% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |  10000000   |   1.368 ms |       1.56% |   1.720 ms |       0.86% | 351.679 us |  25.70% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |  10000000   |   1.311 ms |       3.83% |   1.634 ms |       4.29% | 322.344 us |  24.58% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |  10000000   |   1.217 ms |       0.91% |   1.419 ms |       4.84% | 202.116 us |  16.61% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |  10000000   |   1.238 ms |       0.71% |   1.393 ms |       1.07% | 154.698 us |  12.49% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |  10000000   |   1.229 ms |       1.11% |   1.307 ms |       0.72% |  77.894 us |   6.34% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |  10000000   |   1.251 ms |       0.39% |   1.318 ms |       0.40% |  66.976 us |   5.35% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    1000000    |  10000000   |   2.047 ms |       0.87% |   2.151 ms |       0.80% | 103.565 us |   5.06% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |   10000000    |  10000000   |   2.606 ms |       1.98% |   2.589 ms |       1.22% | -16.657 us |  -0.64% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |  100000000  |  88.256 ms |       0.58% |  93.575 ms |       0.13% |   5.318 ms |   6.03% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      16       |  100000000  |  20.585 ms |       7.13% |  29.284 ms |      14.40% |   8.699 ms |  42.26% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      64       |  100000000  |  12.707 ms |       2.20% |  17.506 ms |       9.21% |   4.799 ms |  37.77% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |  100000000  |  10.882 ms |       1.37% |  13.706 ms |       3.24% |   2.824 ms |  25.96% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |  100000000  |  10.405 ms |       2.04% |  12.180 ms |       4.19% |   1.776 ms |  17.07% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |  100000000  |   9.789 ms |       0.13% |  11.904 ms |       0.06% |   2.116 ms |  21.61% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |  100000000  |  10.156 ms |       0.08% |  12.322 ms |       0.11% |   2.166 ms |  21.33% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |  100000000  |  11.122 ms |       0.14% |  13.462 ms |       0.19% |   2.340 ms |  21.04% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |  100000000  |  11.301 ms |       0.07% |  13.665 ms |       0.19% |   2.364 ms |  20.92% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    1000000    |  100000000  |  18.812 ms |       0.49% |  20.026 ms |       0.60% |   1.214 ms |   6.45% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |   10000000    |  100000000  |  22.836 ms |       0.30% |  24.920 ms |       1.30% |   2.084 ms |   9.12% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |   100000000   |  100000000  |  25.983 ms |       0.44% |  25.961 ms |       0.28% | -21.910 us |  -0.08% |   �[32mPASS�[39m   |

# Summary

- Total Matches: 156
  - Pass    (diff <= min_noise): 111
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  45

@PointKernel
Copy link
Member

/ok to test

include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
include/cuco/utility/reduction_functors.cuh Show resolved Hide resolved
@sleeepyjack
Copy link
Collaborator

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel PointKernel merged commit d625fca into NVIDIA:dev Aug 7, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review Awaiting reviews before merging topic: static_map Issue related to the static_map type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants