-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: retrieve_all fails with over 1B items #576
Comments
The problem is that the If(void* d_temp_storage,
size_t& temp_storage_bytes,
InputIteratorT d_in,
OutputIteratorT d_out,
NumSelectedIteratorT d_num_selected_out,
int num_items,
SelectOp select_op,
cudaStream_t stream,
bool debug_synchronous) The corresponding CCCL issue is tracked via NVIDIA/cccl#1422 |
I think we need to provide a workaround in form of a custom kernel here since the fix in cccl won't be available to us anytime soon. |
@sleeepyjack, I think that’s the best option available at this time. |
This would be an even easier temporary solution although it comes with a performance hit. However, I don't think a custom implementation (quickly hacked together) will be faster than what cub does in this case. |
Are you referring to passing a custom equal op to |
This came up in conversation with @davidwendt. I wanted to track a few findings that I had in conversation with @PointKernel in a public issue. The capacity of the set/map is the problem here, not the number of inputs. Normally cuDF's hashmaps use 50% occupancy, so we run into this problem at just over 1B items (half of If we run cuCo benchmarks like this:
with 90% occupancy and 1.9B inputs, it passes because the capacity is below The capacity of the set/map is the input that we provide to CUB that hits the |
Is this a duplicate?
Type of Bug
Silent Failure
Describe the bug
We observed a hang in cuDF for hash-based groupby aggregations with over 1B items. I traced it to a hang in the
static_map
retrieve_all
algorithm. The same hang can be observed instatic_set
benchmarks:This passes at Occupancy 0.9, so it must be a problem with the total size.
How to Reproduce
Construct a static_map or static_set with a size greater than 1B elements (1.2B will hang). Call
retrieve_all
.Expected behavior
Results are returned without hanging.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: