This sofware is intended to run experiments that overlap communication and computation in order collect performance data. It runs a toy model where GPU kernels are overlap with MPI GPU communication. It provides a synchornous mode (where the host timeline is block until the communication has finished) and an aysnchronous mode where communication is split into Start and Wait methods wrapping the calls to the kernel launch.
The following picture shows an example of flow of the code for a simple communication pattern:
The comm_overlap_bench package must be compiled against STELLA and GCL libraries. It allows to test the overlap of computations generated by STELLA library and communications generated by the GCL library. It also allows to test the overlap using standalone CUDA kernels and MPI non blocking communication calls.