Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canvas.raster fixes #862

Merged
merged 5 commits into from
Jan 21, 2020
Merged

Canvas.raster fixes #862

merged 5 commits into from
Jan 21, 2020

Conversation

jonmmease
Copy link
Collaborator

It looks like the tests in datashader/tests/test_raster.py have been getting skipped on CI because rasterio was not being installed in the test environment.

@jonmmease
Copy link
Collaborator Author

It seems that adding rasterio to the test environment is resulting in conda being unable to solve the environment. For example: https://travis-ci.org/holoviz/datashader/jobs/638195707

@jonmmease
Copy link
Collaborator Author

Adding rasterio to the tests requirements resulted in a conda solver failure. So I added it as a pip install command in .travis.yml for now.

Now, here are the test_raster.py test that are failing:

  • test_raster_integer_nan_value_padding
  • test_raster_float_nan_value_padding
  • test_raster_single_pixel_range_with_padding

@jbednar jbednar added this to the 0.10.0 milestone Jan 19, 2020
@jbednar
Copy link
Member

jbednar commented Jan 19, 2020

Looks like those tests have been failing for a while:

test_raster_integer_nan_value_padding:       failing since 0.8.0; did not fail in 0.7.0
test_raster_float_nan_value_padding:         failing since 0.8.0; did not fail in 0.7.0
test_raster_single_pixel_range_with_padding: failing since 0.7.0; did not fail in 0.6.6

Different tests were failing in 0.6.6 and in previous versions, so it's hard to find some pristine reference to compare against.

@jbednar
Copy link
Member

jbednar commented Jan 20, 2020

@philippjfr, I used git bisect to narrow it down further, and it looks like test_raster_integer_nan_value_padding started failing in #762 (commit 1b9f3009), and was working fine in the previous commit (2491eac3).

PR #762 has a note from you saying "Upsampling with 'linear' interpolation does not yet use correct padding so boundaries are wrong", which sounds suspiciously like what this is testing, though this test appears to be downsampling a 4x3 array down to 3x3, not upsampling. The PR also has a comment from me saying "it would be difficult for me to spot an off-by-one error...". :-).

I pulled out the test code as:

import datashader as ds, numpy as np, xarray as xr
cvs = ds.Canvas(plot_height=3, plot_width=3, x_range=(0, 2), y_range=(0, 2))
array = np.array([[9999, 1, 2, 3], 
                  [4, 9999, 6, 7], 
                  [8, 9, 9999, 11]])
xr_array = xr.DataArray(array, coords={'x': np.linspace(0, 1, 4),
                                       'y': np.linspace(0, 1, 3)}, dims=['y', 'x'])
print(xr_array)
agg = cvs.raster(xr_array, downsample_method='max', nan_value=9999)
print(agg)

What this test is verifying is that when calling cvs.raster(xr_array, downsample_method='max', nan_value=9999) on a 4x3 starting array like:

class Image(xr.DataArray):
<xarray.DataArray (y: 3, x: 4)>
array([[9999,    1,    2,    3],
       [   4, 9999,    6,    7],
       [   8,    9, 9999,   11]])
Coordinates:
  * x        (x) float64 0.0 0.3333 0.6667 1.0
  * y        (y) float64 0.0 0.5 1.0

the result is a 3x3 array padded with the indicated value:

<xarray.DataArray (y: 3, x: 3)>
array([[   4,    7, 9999],
       [   9,   11, 9999],
       [9999, 9999, 9999]])
Coordinates:
  * x        (x) float64 0.3333 1.0 1.667
  * y        (y) float64 0.3333 1.0 1.667
Attributes:
    res:      0.3333333333333333

After #762, the calculated coordinates, dims, and resolution are the same as before, but the actual data array is now too small:

[[ 4  7]
 [ 9 11]] 

Any idea how to fix the padding?

@philippjfr
Copy link
Member

PR #762 has a note from you saying "Upsampling with 'linear' interpolation does not yet use correct padding so boundaries are wrong", which sounds suspiciously like what this is testing, though this test appears to be downsampling a 4x3 array down to 3x3).

Afaik that note should only apply to dask resampling where offsets are actually used but I could have accidentally introduced some regression. Will have a look.

@jbednar
Copy link
Member

jbednar commented Jan 20, 2020

Thanks. It looks to me like the first two failing tests have the same underlying cause (as their code differs only by data type). The third one (test_raster_single_pixel_range_with_padding) looks different; looks like that test is expecting

[[nan nan nan nan] 
 [nan  0.  0.  0.]
 [nan  0.  0.  0.] 
 [nan  0.  0.  0.]]

but getting

[[nan nan nan nan]
 [nan nan  0.  0.]
 [nan nan  0.  0.]
 [nan nan  0.  0.]] 

@philippjfr
Copy link
Member

philippjfr commented Jan 20, 2020

Two actual bugs here:

  1. The code would drop bottom/right raster padding if no top/left padding was required.
  2. The 'mode' aggregation was always broken as far as I can tell, it was allocating a too small buffer to accumulate the unique values into (floor when it should have been ceil in a calculation)and did not reset the frequencies buffer between output pixels so you'd either end up with an out-of-bounds error (and therefore a segmentation fault) OR you'd end up with wrong results.

@jbednar jbednar changed the title Add rasterio as test dependency so that raster tests run on CI Canvas.raster fixes Jan 21, 2020
@jbednar
Copy link
Member

jbednar commented Jan 21, 2020

Thanks @philippjfr and @jonmmease ! To summarize the changes made in this PR:

  • Install rasterio to make sure any tests that depend on rasterio are run. Currently installs rasterio via pip to avoid conda conflicts, which should be addressed eventually (issue Rasterio can't be installed via conda for testing #865 opened).
  • Fix test_raster.py to ensure raster tests are run whether or not rasterio is installed, as rasterio is not a hard dependency in any case.
  • Fix Xarray data array padding calculations for Canvas.raster in core.py.
  • Fix Canvas.raster for mode aggregations in resampling.py; was allocating a too-small buffer and not zeroing it between output pixels, causing potential memory-overwrite errors.
  • Fix bogus test data in test_raster_single_pixel_range_with_padding in test_raster.py

It may be possible to optimize the raster mode calculations by allocating one values/frequencies buffer per thread and then emptying it out between output pixels, but this would make the code more complex and harder to reason about, and may not provide any performance benefit. Still, if someone want to optimize mode calculations, they could consider implementing and benchmarking that. Meanwhile, this approach should be safe and is probably reasonably fast.

@jbednar jbednar merged commit 2cd44f1 into master Jan 21, 2020
@jbednar jbednar deleted the raster_tests_ci branch January 21, 2020 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants