Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add host-bulk insert_or_apply using shared_memory #551

Merged
merged 18 commits into from
Aug 7, 2024

Conversation

srinivasyadav18
Copy link
Contributor

This PR add's host-bulk insert_or_apply using shared_memory which could improve performance in low cardinality and very high mulitiplicty case.

Copy link

copy-pr-bot bot commented Jul 17, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@sleeepyjack
Copy link
Collaborator

/ok to test

@PointKernel PointKernel added topic: performance Performance related issue type: improvement Improvement / enhancement to an existing function topic: static_map Issue related to the static_map Needs Review Awaiting reviews before merging labels Jul 17, 2024
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
tests/static_map/insert_or_apply_test.cu Show resolved Hide resolved
@sleeepyjack
Copy link
Collaborator

/ok to test

1 similar comment
@sleeepyjack
Copy link
Collaborator

/ok to test

Copy link
Collaborator

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round. Nice work!

Does the unit test cover each of the new code paths, i.e., shmem vs gmem kernel?

include/cuco/detail/static_map/kernels.cuh Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final nits. @srinivasyadav18 Can you please share the insert or apply benchmark results before and after this PR in the PR discussion?

Excellent work! Thank you!

include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
include/cuco/detail/static_map/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map/kernels.cuh Outdated Show resolved Hide resolved
@srinivasyadav18
Copy link
Contributor Author

Benchmarks :

Cmp time = global memory implementation [before]
Ref time = shared memory implementation (current PR) [after]

['./shmem_h100.json', './global_h100.json']
# static_map_insert_or_apply_uniform_multiplicity

## [0] NVIDIA H100 80GB HBM3

|  Key  |  Value  |  Distribution  |  Cardinality  |  NumInputs  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |    %Diff |  Status  |
|-------|---------|----------------|---------------|-------------|------------|-------------|------------|-------------|--------------|----------|----------|
|  I32  |   I32   |    UNIFORM     |       1       |      1      |  37.307 us |       2.21% |  36.785 us |       3.37% |    -0.522 us |   -1.40% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     128     |  37.070 us |       3.87% |  36.509 us |       1.84% |    -0.561 us |   -1.51% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |     128     |  37.009 us |       2.08% |  36.674 us |       3.10% |    -0.335 us |   -0.91% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     256     |  36.768 us |       2.52% |  36.496 us |       2.83% |    -0.272 us |   -0.74% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |     256     |  36.806 us |       2.99% |  36.533 us |       5.51% |    -0.273 us |   -0.74% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |     256     |  36.721 us |       2.39% |  36.544 us |       2.20% |    -0.177 us |   -0.48% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |     512     |  36.696 us |       3.22% |  36.249 us |       2.04% |    -0.447 us |   -1.22% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |     512     |  36.663 us |       2.43% |  36.321 us |       1.74% |    -0.342 us |   -0.93% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |     512     |  36.743 us |       6.61% |  36.546 us |       5.13% |    -0.197 us |   -0.54% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |     512     |  36.805 us |       2.31% |  36.466 us |       3.55% |    -0.339 us |   -0.92% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |    1000     |  36.985 us |       2.81% |  36.676 us |       2.76% |    -0.309 us |   -0.84% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |    1000     |  37.197 us |       2.32% |  36.760 us |       1.77% |    -0.437 us |   -1.17% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |    1000     |  37.080 us |       2.59% |  36.822 us |       1.85% |    -0.257 us |   -0.69% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |    1000     |  37.184 us |       2.69% |  36.692 us |       2.27% |    -0.492 us |   -1.32% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |    1000     |  37.198 us |       3.92% |  38.413 us |       9.73% |     1.215 us |    3.27% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |    10000    |  52.425 us |       4.91% |  51.852 us |       5.11% |    -0.572 us |   -1.09% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |    10000    |  36.888 us |       1.83% |  36.601 us |       1.87% |    -0.287 us |   -0.78% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |    10000    |  37.065 us |       2.17% |  36.566 us |       1.80% |    -0.499 us |   -1.35% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |    10000    |  36.885 us |       1.90% |  36.536 us |       1.75% |    -0.349 us |   -0.95% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |    10000    |  37.044 us |       2.77% |  36.616 us |       2.77% |    -0.428 us |   -1.16% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |    10000    |  45.115 us |       1.57% |  44.828 us |       1.42% |    -0.286 us |   -0.63% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |   100000    | 203.928 us |       4.67% | 207.112 us |       2.12% |     3.184 us |    1.56% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |   100000    | 109.486 us |       8.56% | 112.419 us |       0.84% |     2.933 us |    2.68% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |   100000    | 109.871 us |       7.70% | 112.517 us |       0.71% |     2.646 us |    2.41% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |   100000    | 109.773 us |       8.81% | 112.311 us |       0.89% |     2.539 us |    2.31% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |   100000    | 108.793 us |       9.69% | 112.325 us |       3.46% |     3.532 us |    3.25% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |   100000    | 109.010 us |       9.18% | 111.920 us |       1.52% |     2.909 us |    2.67% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |   100000    | 110.232 us |       7.93% | 112.106 us |       2.71% |     1.873 us |    1.70% |   �[32mPASS�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |   1000000   | 244.797 us |       8.62% |   1.034 ms |      32.04% |   789.250 us |  322.41% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |   1000000   | 244.435 us |       9.34% | 393.275 us |     299.68% |   148.840 us |   60.89% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |   1000000   | 244.210 us |       7.45% | 318.566 us |     134.25% |    74.356 us |   30.45% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |   1000000   | 244.703 us |       8.54% | 329.651 us |      88.30% |    84.948 us |   34.71% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |   1000000   | 244.287 us |       8.92% | 359.602 us |     139.77% |   115.315 us |   47.20% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |   1000000   | 244.203 us |       8.32% | 318.102 us |      97.06% |    73.899 us |   30.26% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |   1000000   | 243.934 us |       6.92% | 286.277 us |      58.85% |    42.343 us |   17.36% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    1000000    |   1000000   | 261.056 us |       5.88% | 362.666 us |      89.73% |   101.610 us |   38.92% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |  10000000   | 678.790 us |       2.28% |   8.454 ms |       6.74% |     7.775 ms | 1145.42% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |  10000000   | 673.851 us |       2.01% |   1.576 ms |     348.86% |   902.336 us |  133.91% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |  10000000   | 680.172 us |       1.93% | 969.807 us |       2.83% |   289.635 us |   42.58% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |  10000000   | 690.132 us |       1.85% | 922.436 us |       3.03% |   232.304 us |   33.66% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |  10000000   | 716.705 us |       2.04% | 912.846 us |       1.31% |   196.140 us |   27.37% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |  10000000   |   1.013 ms |       1.19% | 916.636 us |       0.99% |   -95.951 us |   -9.48% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |  10000000   |   1.041 ms |       1.41% | 936.241 us |       0.65% |  -104.789 us |  -10.07% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    1000000    |  10000000   |   1.232 ms |       0.77% |   1.153 ms |       0.82% |   -78.668 us |   -6.39% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |   10000000    |  10000000   |   1.525 ms |       1.34% |   1.361 ms |       0.58% |  -164.357 us |  -10.77% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |       1       |  100000000  |   4.563 ms |       0.38% |  78.140 ms |       0.24% |    73.578 ms | 1612.62% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      128      |  100000000  |   4.563 ms |       0.46% |  83.503 ms |     254.79% |    78.939 ms | 1729.84% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      256      |  100000000  |   4.563 ms |       0.40% |   6.912 ms |      49.11% |     2.349 ms |   51.49% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |      512      |  100000000  |   4.565 ms |       0.52% |   5.884 ms |       0.57% |     1.319 ms |   28.90% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     1000      |  100000000  |   4.563 ms |       0.49% |   5.898 ms |       0.43% |     1.335 ms |   29.26% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |     10000     |  100000000  |   7.966 ms |       1.12% |  10.623 ms |     299.99% |     2.657 ms |   33.35% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    100000     |  100000000  |   8.208 ms |       1.46% |   6.876 ms |       0.23% | -1331.875 us |  -16.23% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |    1000000    |  100000000  |  11.469 ms |      23.15% |  10.047 ms |       0.98% | -1422.299 us |  -12.40% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |   10000000    |  100000000  |  14.049 ms |       0.25% |  12.469 ms |       0.86% | -1579.866 us |  -11.25% |   �[31mFAIL�[39m   |
|  I32  |   I32   |    UNIFORM     |   100000000   |  100000000  |  15.535 ms |       0.53% |  12.898 ms |       3.16% | -2636.527 us |  -16.97% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |      1      |  37.464 us |       2.03% |  36.719 us |       7.31% |    -0.745 us |   -1.99% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     128     |  36.984 us |       1.95% |  36.550 us |       1.75% |    -0.434 us |   -1.17% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |     128     |  45.210 us |       1.50% |  44.572 us |       1.40% |    -0.638 us |   -1.41% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     256     |  36.912 us |       1.97% |  36.503 us |       1.93% |    -0.409 us |   -1.11% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |     256     |  37.069 us |       2.08% |  36.561 us |       2.71% |    -0.508 us |   -1.37% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |     256     |  37.391 us |       4.26% |  37.511 us |       6.58% |     0.120 us |    0.32% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |     512     |  37.179 us |       1.88% |  36.468 us |       1.89% |    -0.712 us |   -1.91% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |     512     |  36.982 us |       1.95% |  36.524 us |       1.92% |    -0.458 us |   -1.24% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |     512     |  37.095 us |       1.99% |  36.460 us |       1.81% |    -0.635 us |   -1.71% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |     512     |  37.822 us |       6.17% |  37.988 us |       7.81% |     0.166 us |    0.44% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |    1000     |  37.550 us |       1.98% |  36.526 us |       1.84% |    -1.025 us |   -2.73% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |    1000     |  37.208 us |       2.13% |  36.655 us |       1.87% |    -0.552 us |   -1.48% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |    1000     |  37.434 us |       3.45% |  36.648 us |       1.78% |    -0.785 us |   -2.10% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |    1000     |  37.491 us |       6.23% |  36.868 us |       3.58% |    -0.623 us |   -1.66% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |    1000     |  43.626 us |       7.58% |  43.458 us |       6.59% |    -0.169 us |   -0.39% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |    10000    |  53.436 us |       2.33% |  52.636 us |       2.47% |    -0.800 us |   -1.50% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |    10000    |  37.415 us |       1.88% |  36.464 us |      22.43% |    -0.951 us |   -2.54% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |    10000    |  37.369 us |       1.84% |  36.597 us |       1.78% |    -0.771 us |   -2.06% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |    10000    |  37.482 us |       1.88% |  36.518 us |       2.20% |    -0.964 us |   -2.57% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |    10000    |  37.171 us |       2.03% |  41.138 us |      20.32% |     3.967 us |   10.67% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |    10000    |  45.737 us |       1.46% |  45.163 us |       3.03% |    -0.573 us |   -1.25% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |   100000    | 193.064 us |       4.08% | 549.889 us |     720.60% |   356.825 us |  184.82% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |   100000    | 109.698 us |       5.05% | 112.364 us |       1.20% |     2.666 us |    2.43% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |   100000    | 109.690 us |       5.47% | 212.610 us |     562.95% |   102.920 us |   93.83% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |   100000    | 109.808 us |       5.65% | 207.059 us |    1495.14% |    97.251 us |   88.56% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |   100000    | 109.780 us |       5.69% | 112.236 us |       0.94% |     2.457 us |    2.24% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |   100000    | 109.738 us |       4.20% | 120.851 us |     435.33% |    11.113 us |   10.13% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |   100000    | 119.594 us |      92.07% | 120.331 us |       2.61% |     0.737 us |    0.62% |   �[32mPASS�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |   1000000   | 264.261 us |       8.38% |   1.085 ms |       3.64% |   820.626 us |  310.54% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |   1000000   | 262.781 us |       5.94% | 305.585 us |      81.15% |    42.804 us |   16.29% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |   1000000   | 263.915 us |       6.21% | 379.687 us |     111.75% |   115.772 us |   43.87% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |   1000000   | 263.520 us |       4.70% | 349.172 us |     133.56% |    85.652 us |   32.50% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |   1000000   | 263.825 us |       6.61% | 442.121 us |     118.65% |   178.296 us |   67.58% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |   1000000   | 265.812 us |       6.00% | 362.939 us |      87.24% |    97.127 us |   36.54% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |   1000000   | 264.857 us |       4.10% | 302.192 us |      76.95% |    37.335 us |   14.10% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    1000000    |   1000000   | 302.500 us |       2.84% | 431.469 us |      87.92% |   128.968 us |   42.63% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |  10000000   |   1.240 ms |       1.22% |  11.841 ms |      99.01% |    10.601 ms |  854.57% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |  10000000   |   1.237 ms |       0.67% |   3.153 ms |     272.79% |     1.916 ms |  154.85% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |  10000000   |   1.249 ms |       0.31% |   1.728 ms |       0.49% |   478.947 us |   38.33% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |  10000000   |   1.268 ms |       0.71% |   1.661 ms |       1.02% |   392.987 us |   30.99% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |  10000000   |   1.310 ms |       0.35% |   1.596 ms |       0.61% |   286.049 us |   21.84% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |  10000000   |   1.576 ms |       0.40% |   1.533 ms |       0.49% |   -43.815 us |   -2.78% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |  10000000   |   1.576 ms |       0.38% |   1.544 ms |       0.49% |   -32.773 us |   -2.08% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    1000000    |  10000000   |   2.135 ms |       0.72% |   1.921 ms |       0.85% |  -213.980 us |  -10.02% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |   10000000    |  10000000   |   2.402 ms |       0.99% |   2.178 ms |       1.04% |  -223.449 us |   -9.30% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |       1       |  100000000  |   5.645 ms |       4.22% |  86.638 ms |       0.62% |    80.993 ms | 1434.71% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      128      |  100000000  |   6.365 ms |      32.25% |  12.675 ms |       1.16% |     6.310 ms |   99.14% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      256      |  100000000  |   6.613 ms |      40.31% |  11.624 ms |       0.62% |     5.010 ms |   75.76% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |      512      |  100000000  |   7.042 ms |      42.60% |  10.042 ms |       2.08% |     3.000 ms |   42.60% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     1000      |  100000000  |   6.739 ms |      14.99% |  10.212 ms |       0.30% |     3.474 ms |   51.55% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |     10000     |  100000000  |  11.769 ms |       2.87% |  11.108 ms |       0.36% |  -660.833 us |   -5.62% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    100000     |  100000000  |  11.778 ms |       0.14% |  11.308 ms |       0.64% |  -470.690 us |   -4.00% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |    1000000    |  100000000  |  16.359 ms |       0.55% |  13.897 ms |       1.54% | -2461.981 us |  -15.05% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |   10000000    |  100000000  |  19.474 ms |       0.42% |  17.215 ms |       0.27% | -2259.446 us |  -11.60% |   �[31mFAIL�[39m   |
|  I64  |   I64   |    UNIFORM     |   100000000   |  100000000  |  20.222 ms |       0.30% |  17.643 ms |       5.83% | -2578.661 us |  -12.75% |   �[31mFAIL�[39m   |

# Summary

- Total Matches: 110
  - Pass    (diff <= min_noise): 38
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  72

@PointKernel
Copy link
Member

/ok to test

@sleeepyjack
Copy link
Collaborator

/ok to test

@srinivasyadav18
Copy link
Contributor Author

I think there are issues with rebasing. I need to resolve it and push the changes again.

@PointKernel
Copy link
Member

/ok to test

@srinivasyadav18
Copy link
Contributor Author

As CI tests for GCC 12 fail, I have added a workaround (in e804b4c) to pass the error, by using a pre-constructed value to be used as size of shared_map.

@sleeepyjack
Copy link
Collaborator

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

srinivasyadav18 and others added 2 commits August 2, 2024 17:08
use shared_memory kernel only if `cg_size == 1`.
use `shmem_block_size` when calculating `shmem_grid_size`.
@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel PointKernel merged commit bd4e27b into NVIDIA:dev Aug 7, 2024
19 checks passed
PointKernel pushed a commit that referenced this pull request Aug 16, 2024
This PR cleans up some of the issues occured during merge of #551. 

1. propagate the **key_eq** and **probing_scheme** from **global** `ref`
to constructor of `shared_memory_ref` in **insert_or_apply_shmem**
kernel.
2. Disable **init** overload of `insert_or_apply` using **sfinae**,
because `cuda::stream_ref` is default constructed, this can invoke the
**init** overload even though the user calls **no-init** overload.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review Awaiting reviews before merging topic: performance Performance related issue topic: static_map Issue related to the static_map type: improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants