[BUG]: retrieve_all fails with over 1B items #576

bdice · 2024-08-12T18:41:39Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this bug (https://github.com/NVIDIA/cuCollections/issues)

Type of Bug

Silent Failure

Describe the bug

We observed a hang in cuDF for hash-based groupby aggregations with over 1B items. I traced it to a hang in the static_map retrieve_all algorithm. The same hang can be observed in static_set benchmarks:

build/latest/benchmarks/STATIC_SET_BENCH -b 10 -d 0 -a Occupancy=0.5 -a NumInputs=1200000000 -a Key=I32 --run-once

This passes at Occupancy 0.9, so it must be a problem with the total size.

How to Reproduce

Construct a static_map or static_set with a size greater than 1B elements (1.2B will hang). Call retrieve_all.

Expected behavior

Results are returned without hanging.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

The text was updated successfully, but these errors were encountered:

PointKernel · 2024-08-12T21:28:23Z

The problem is that the num_items parameter in the current cub::DeviceSelect::If API (used by retrieve_all) is of type int, which prevents it from handling inputs larger than INT_MAX:

  If(void* d_temp_storage,
     size_t& temp_storage_bytes,
     InputIteratorT d_in,
     OutputIteratorT d_out,
     NumSelectedIteratorT d_num_selected_out,
     int num_items,
     SelectOp select_op,
     cudaStream_t stream,
     bool debug_synchronous)

The corresponding CCCL issue is tracked via NVIDIA/cccl#1422

sleeepyjack · 2024-08-14T17:30:58Z

I think we need to provide a workaround in form of a custom kernel here since the fix in cccl won't be available to us anytime soon.

bdice · 2024-08-14T17:33:10Z

@sleeepyjack, I think that’s the best option available at this time.

sleeepyjack · 2024-08-14T17:42:59Z

NVIDIA/cccl#1422 (comment)

This would be an even easier temporary solution although it comes with a performance hit. However, I don't think a custom implementation (quickly hacked together) will be faster than what cub does in this case.

PointKernel · 2024-08-14T19:27:22Z

NVIDIA/cccl#1422 (comment)

This would be an even easier temporary solution although it comes with a performance hit. However, I don't think a custom implementation (quickly hacked together) will be faster than what cub does in this case.

Are you referring to passing a custom equal op to cub::DeviceSelect::UniqueByKey so we can use UniqueByKey to implement retrieve_all?

bdice · 2024-08-15T17:02:38Z

This came up in conversation with @davidwendt. I wanted to track a few findings that I had in conversation with @PointKernel in a public issue.

The capacity of the set/map is the problem here, not the number of inputs. Normally cuDF's hashmaps use 50% occupancy, so we run into this problem at just over 1B items (half of INT_MAX is about 1.2B). In the general case we need to look at num_items / occupancy_fraction.

If we run cuCo benchmarks like this:

build/latest/benchmarks/STATIC_SET_BENCH -b 10 -d 0 -a Occupancy=0.9 -a NumInputs=1900000000 -a Key=I32 --run-once

with 90% occupancy and 1.9B inputs, it passes because the capacity is below INT_MAX due to high occupancy.

The capacity of the set/map is the input that we provide to CUB that hits the INT_MAX limit.

bdice added the type: bug Something isn't working label Aug 12, 2024

PointKernel self-assigned this Aug 12, 2024

PointKernel added the helps: rapids Helps or needed by RAPIDS label Aug 12, 2024

PointKernel mentioned this issue Aug 12, 2024

Add support for large num_items to device_select.cuh NVIDIA/cccl#1422

Open

bdice mentioned this issue Aug 15, 2024

[BUG] Series.value_counts hangs with over 1B rows of input rapidsai/cudf#16526

Closed

PointKernel mentioned this issue Aug 15, 2024

Fix retrieve_all for containers with large capacity #580

Merged

PointKernel closed this as completed in #580 Aug 16, 2024

PointKernel closed this as completed in abc5095 Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: retrieve_all fails with over 1B items #576

[BUG]: retrieve_all fails with over 1B items #576

bdice commented Aug 12, 2024

PointKernel commented Aug 12, 2024

sleeepyjack commented Aug 14, 2024

bdice commented Aug 14, 2024

sleeepyjack commented Aug 14, 2024

PointKernel commented Aug 14, 2024

bdice commented Aug 15, 2024 •

edited

Loading

[BUG]: retrieve_all fails with over 1B items #576

[BUG]: retrieve_all fails with over 1B items #576

Comments

bdice commented Aug 12, 2024

Is this a duplicate?

Type of Bug

Describe the bug

How to Reproduce

Expected behavior

Reproduction link

Operating System

nvidia-smi output

NVCC version

PointKernel commented Aug 12, 2024

sleeepyjack commented Aug 14, 2024

bdice commented Aug 14, 2024

sleeepyjack commented Aug 14, 2024

PointKernel commented Aug 14, 2024

bdice commented Aug 15, 2024 • edited Loading

bdice commented Aug 15, 2024 •

edited

Loading