This repository is a proof-of-concept demo for different concrete backend implementations for scikit-image. It is not intended for use by downstream projects. In practice, such backends would ideally be maintained by their upstream projects (cuCIM, etc.) so that they can be tested, ensuring that they stay in sync.
This respository is compatible with a specific uarray
development branch of scikit-image
which implements uarray
multimethods for the majority of the scikit-image API.
Most of the multimethods were autogenerated using a script provided in the
tools directory here. That branch also differs from scikit-image main in that some
currently deprecated arguments have already been removed. Specifically,
multichannel
was removed in favor of channel_axis
and selem
was
removed in favor of footprint
.
An editable development install can be performed by checking out the repository and using:
pip install -e . -v
Runtime requirements:
- NumPy >= 1.17
- CuPy >= 9
- cuCIM >= 21.08
- uarray >= 0.8.2
- dask[array] >= 2.0
- scikit-image from branch https://github.com/grlee77/scikit-image/tree/uarray
A small subset of functions such as skimage.filters.gaussian
,
skimage.filters.median
and skimage.morphology.binary_erosion
are
implemented for multiple different backends. Specifically the following
backends are provided:
-
uskimage_demo.dask_backend - This backend accepts Dask arrays as input and returns Dask futures.
-
uskimage_demo.dask_numpy_backend - This backend shares the multimethod implementations of
dask_backend
, but has two differences in behavior: 1.) It automatically will convert numpy.array inputs to dask arrays with a suitable chunk size. 2.) It callscompute()
on the function ouptuts so that NumPy arrays are returned instead of dask futures. -
uskimage_demo.diplib_backend - This backend calls corresponding functions from the DIPlib library. DIPlib provides efficient, multithreaded C++ implementations with Python wrappers.
-
uskimage_demo.cucim_backend - This backend accepts CuPy GPU array inputs and calls the corresponding GPU-based implementations from cuCIM.
-
uskimage_demo.cucim_cpu_backend - This backend is a wrapper around
cucim_cpu
that handles host/device transfers such that NumPy array inputs will be transfered to the GPU and the function outputs will be transfered back to NumPy arrays on the host.
SimpleITK is another popular library supporting nD images that could likely provide a backend for some functions.
OpenCV or scikit-ipp could also be used. While fast, these are more restricted on the dtypes supported and I think are typically for 2D images only.
Running the included demo/backends_demo.py
shows that the provided backends
can provide substantial acceleration over scikit-image's own implementations on
a single node workstation with a 10-core/20-thread CPU
(Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz). For the cuCIM-based backends,
computations ran on an NVIDIA GTX 1080 Ti GPU.
In most cases there is a hierarchy of performance among the backends as follows.
cucim_backend > cucim_cpu_backend > diplib_backend > dask_backend > scikit-image
It should be noted, though, that the above is for use of a single node whereas the Dask case could potentially be extended to also distribute work across multiple nodes.
For this demo we intentionally did not implement a Dask or DIPlib multimethod
for either difference_of_gaussian
or structural_similarity
. Note, however
that difference_of_guassian
still shows acceleration with both of these
backends because this function calls gaussian
twice internally and gaussian
does have an implementation in the backend! The case is similar for
structural_similarity
, which also uses gaussian
internally. However,
in this case the boundary extension mode used is not present in DIPlib and so
the call to gaussian
falls back to the standard scikit-image implementation
and no acceleration is observed.
Also, for binary erosion with somewhat large footprints as used here, we have
pending improvements enabling footprint decomposition that will give faster
results for both the scikit-image
and cucim
backends relative to what is
presented here.
Running with 2D data:
For gaussian, median and structural_similarity functions:
image.shape=(4096, 2048)
image.dtype=float32
For binary erosion:
binary blobs shape=(2048, 2048)
ball footprint shape = (11, 11)
rect footprint shape = (11, 11)
** Timings with default skimage backend**
gaussian : CPU: 358.497 ms +/- 2.816 (min: 354.936 / max: 362.594) ms
difference_of_gaussians : CPU: 627.612 ms +/-21.027 (min: 609.264 / max: 663.219) ms
median (ball) : CPU: 7685.304 ms
median (rect) : CPU:11060.581 ms
erosion (ball) : CPU: 147.811 ms +/- 2.450 (min: 142.464 / max: 150.965) ms
erosion (rect) : CPU: 233.441 ms +/-32.597 (min: 214.944 / max: 317.526) ms
structural sim. : CPU: 1480.584 ms +/-12.704 (min: 1467.881 / max: 1493.288) ms
** Timings with Dask backend enabled (CPU, scheduler='threads')**
gaussian : CPU: 198.221 ms +/-17.518 (min: 160.228 / max: 232.355) ms
difference_of_gaussians : CPU: 238.046 ms +/-14.382 (min: 213.749 / max: 265.874) ms
median (ball) : CPU: 888.443 ms +/-27.000 (min: 864.415 / max: 926.157) ms
median (rect) : CPU: 1169.550 ms +/-10.241 (min: 1159.309 / max: 1179.790) ms
erosion (ball) : CPU: 141.625 ms +/- 8.409 (min: 125.899 / max: 153.718) ms
erosion (rect) : CPU: 163.532 ms +/-21.080 (min: 142.507 / max: 231.082) ms
structural sim. : CPU: 1076.996 ms +/-64.082 (min: 1012.913 / max: 1141.078) ms
** Timings with diplib backend enabled (threads=20)**
gaussian : CPU: 58.320 ms +/-21.611 (min: 40.461 / max: 100.524) ms
difference_of_gaussians : CPU: 167.277 ms +/-20.296 (min: 130.111 / max: 198.398) ms
median (ball) : CPU: 666.159 ms +/-20.454 (min: 645.942 / max: 698.992) ms
median (rect) : CPU: 938.623 ms +/-32.421 (min: 896.971 / max: 976.050) ms
erosion (ball) : CPU: 22.463 ms +/- 6.103 (min: 6.640 / max: 39.190) ms
erosion (rect) : CPU: 50.127 ms +/-20.970 (min: 6.477 / max: 74.148) ms
structural sim. : CPU: 1461.264 ms +/-20.882 (min: 1440.382 / max: 1482.146) ms
** Timings with cucim_cpu backend **
gaussian : CPU: 25.767 ms +/- 0.138 (min: 25.459 / max: 26.201) ms
difference_of_gaussians : CPU: 29.383 ms +/- 0.103 (min: 29.149 / max: 29.596) ms
median (ball) : CPU: 280.674 ms +/- 0.208 (min: 280.395 / max: 280.987) ms
median (rect) : CPU: 550.639 ms +/- 0.307 (min: 550.211 / max: 550.992) ms
erosion (ball) : CPU: 6.330 ms +/- 0.021 (min: 6.286 / max: 6.404) ms
erosion (rect) : CPU: 2.486 ms +/- 0.016 (min: 2.445 / max: 2.542) ms
structural sim. : CPU: 42.156 ms +/- 0.091 (min: 42.009 / max: 42.390) ms
** Timings with cucim backend (no host/device transfer) **
gaussian : GPU-0: 10.232 ms +/- 0.025 (min: 10.199 / max: 10.329) ms
difference_of_gaussians : GPU-0: 13.890 ms +/- 0.019 (min: 13.853 / max: 13.953) ms
median (ball) : GPU-0: 265.037 ms +/- 0.210 (min: 264.738 / max: 265.526) ms
median (rect) : GPU-0: 534.763 ms +/- 0.259 (min: 534.431 / max: 535.066) ms
erosion (ball) : GPU-0: 4.986 ms +/- 0.017 (min: 4.966 / max: 5.206) ms
erosion (rect) : GPU-0: 1.153 ms +/- 0.010 (min: 1.136 / max: 1.232) ms
structural sim. : GPU-0: 32.278 ms +/- 0.022 (min: 32.220 / max: 32.359) ms
Running with 3D data:
For filters and measurements functions:
image.shape=(256, 256, 128)
image.dtype=float32
For binary morphology:
binary blobs shape=(256, 256, 256)
ball footprint shape = (7, 7, 7)
rect footprint shape = (7, 7, 7)
** Timings with default skimage backend**
gaussian : CPU: 470.269 ms +/- 3.526 (min: 465.000 / max: 474.272) ms
difference_of_gaussians : CPU: 762.822 ms +/- 2.910 (min: 758.849 / max: 765.737) ms
median (ball) : CPU:11378.148 ms
median (rect) : CPU:29591.795 ms
erosion (ball) : CPU: 753.432 ms +/- 9.170 (min: 740.725 / max: 762.029) ms
erosion (rect) : CPU: 1507.758 ms +/- 5.866 (min: 1501.892 / max: 1513.625) ms
structural sim. : CPU: 1752.775 ms +/- 9.866 (min: 1742.909 / max: 1762.641) ms
** Timings with Dask backend enabled (CPU, scheduler='threads')**
gaussian : CPU: 290.269 ms +/-22.404 (min: 259.381 / max: 323.366) ms
difference_of_gaussians : CPU: 330.863 ms +/-31.330 (min: 291.124 / max: 389.668) ms
median (ball) : CPU: 1458.169 ms +/-17.291 (min: 1440.877 / max: 1475.460) ms
median (rect) : CPU: 3468.485 ms
erosion (ball) : CPU: 725.926 ms +/-43.245 (min: 692.958 / max: 787.020) ms
erosion (rect) : CPU: 725.291 ms +/-51.361 (min: 659.213 / max: 784.449) ms
structural sim. : CPU: 1284.369 ms +/-16.837 (min: 1267.532 / max: 1301.207) ms
** Timings with diplib backend enabled (threads=20)**
gaussian : CPU: 102.941 ms +/-14.279 (min: 68.722 / max: 135.538) ms
difference_of_gaussians : CPU: 198.852 ms +/-20.036 (min: 173.322 / max: 227.423) ms
median (ball) : CPU: 931.723 ms +/-27.284 (min: 895.307 / max: 960.978) ms
median (rect) : CPU: 2003.327 ms
erosion (ball) : CPU: 82.589 ms +/-10.962 (min: 71.367 / max: 114.855) ms
erosion (rect) : CPU: 20.668 ms +/- 2.455 (min: 17.455 / max: 37.924) ms
structural sim. : CPU: 1688.337 ms +/- 5.981 (min: 1682.355 / max: 1694.318) ms
** Timings with cucim_cpu backend **
gaussian : CPU: 31.280 ms +/- 0.768 (min: 30.856 / max: 34.549) ms
difference_of_gaussians : CPU: 36.145 ms +/- 0.818 (min: 35.816 / max: 40.588) ms
median (ball) : CPU: 534.166 ms +/- 0.212 (min: 533.818 / max: 534.389) ms
median (rect) : CPU: 2755.902 ms
erosion (ball) : CPU: 67.111 ms +/- 0.049 (min: 67.013 / max: 67.253) ms
erosion (rect) : CPU: 19.619 ms +/- 0.071 (min: 19.481 / max: 20.001) ms
structural sim. : CPU: 51.924 ms +/- 0.100 (min: 51.713 / max: 52.213) ms
** Timings with cucim backend (no host/device transfer) **
gaussian : GPU-0: 15.000 ms +/- 0.025 (min: 14.975 / max: 15.210) ms
difference_of_gaussians : GPU-0: 20.051 ms +/- 0.080 (min: 19.974 / max: 20.469) ms
median (ball) : GPU-0: 528.453 ms +/- 0.152 (min: 528.318 / max: 528.710) ms
median (rect) : GPU-0: 2743.274 ms
erosion (ball) : GPU-0: 62.290 ms +/- 0.054 (min: 62.195 / max: 62.407) ms
erosion (rect) : GPU-0: 14.698 ms +/- 0.064 (min: 14.482 / max: 14.863) ms
structural sim. : GPU-0: 42.686 ms +/- 0.032 (min: 42.646 / max: 42.861) ms
At the time of writing, there is an ongoing discussion across scientific Python projects about how to unify on approaches to dispatching. Others interested in this topic are encouraged to participate in the conversations related to this at the scientific-python forum:
- A proposed design for supporting multiple array types across SciPy, scikit-learn, scikit-image and beyond
- Support for array types other than NumPy
- Default dispatching behavior for supporting multiple array types across SciPy, scikit-learn, scikit-image
- Requirements and discussion of a type dispatcher for the ecosystem