torch.cdist works faster than distance_matrix. #30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
and it needs more small gpu-memory.
There is a memory overload in torch.pow(x-y,p).sum()
When I try with this code, I already reserve 8GB and try to allocate another 8GB.
I have a RTX 3080 GPU with 10GB memory.
And when I input big image (3,896,896), distance matrix needs 14.36GB but torch.cdist works well.
It means we can use more large sampling rates by using torch.cdist function.
I can run sampling rate = 0.01 with torch.cdist but I got CUDA out of memory with distance_matrix.
Actual GPU memory usages with MVTec AD.
#19 (comment)