Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x and y range attributes on returned aggregations #1198

Merged
merged 2 commits into from
Apr 7, 2023
Merged

x and y range attributes on returned aggregations #1198

merged 2 commits into from
Apr 7, 2023

Conversation

ianthomas23
Copy link
Member

Closes #1157.

This PR adds new attributes x_range and y_range to aggregations returned from datashader. Simple example:

import datashader as ds
import pandas as pd

df = pd.DataFrame(dict(x=[1.1, 2.2, 3.3], y=[4.4, 5.5, 6.6]))
canvas = ds.Canvas(plot_height=5, plot_width=5)
agg = canvas.points(source=df, x="x", y="y")
print(agg)

Output produced:

<xarray.DataArray (y: 5, x: 5)>
array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]], dtype=uint32)
Coordinates:
  * x        (x) float64 1.32 1.76 2.2 2.64 3.08
  * y        (y) float64 4.62 5.06 5.5 5.94 6.38
Attributes:
    x_range:  (1.1, 3.3)
    y_range:  (4.4, 6.6)

so the attributes can be accessed using agg.x_range and similar.

The ranges are set regardless of whether they are specified by the user in the Canvas constructor, or determined from the data limits.

For situations that return an xarray.Dataset rather than an xarray.DataArray, e.g. if a ds.summary() is used, the attributes are copied to the Dataset. Hence they are always available as attributes of the top-level object returned from Canvas aggregation functions.

@ianthomas23 ianthomas23 added this to the v0.14.5 milestone Apr 5, 2023
@codecov
Copy link

codecov bot commented Apr 5, 2023

Codecov Report

Merging #1198 (c6c4c3d) into main (3d2f7df) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1198      +/-   ##
==========================================
+ Coverage   84.68%   84.70%   +0.02%     
==========================================
  Files          35       35              
  Lines        8345     8357      +12     
==========================================
+ Hits         7067     7079      +12     
  Misses       1278     1278              
Impacted Files Coverage Δ
datashader/data_libraries/pandas.py 100.00% <ø> (ø)
datashader/compiler.py 92.90% <100.00%> (+0.09%) ⬆️
datashader/core.py 88.38% <100.00%> (ø)
datashader/data_libraries/dask.py 95.16% <100.00%> (+0.16%) ⬆️
datashader/data_libraries/dask_xarray.py 98.95% <100.00%> (+0.03%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ianthomas23
Copy link
Member Author

Test failures are because our CI conda environments are now using pandas 2.0.0 which is incompatible with recent xarray, so the version of xarray installed is 0.19.0 from July 2021. xarray will fix this in due course, in the mean time I will try pinning pandas < 2 to test this PR.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing the ranges in the returned result is definitely useful, but I would have thought that the ranges were already readable from the extent of the coordinates in the DataArray returned?

@jbednar
Copy link
Member

jbednar commented Apr 6, 2023

Looks like the ranges are slightly larger than the coordinates would suggest. Is that just due to the size of each array cell? If so, having the ranges explicitly listed is indeed useful.

@ianthomas23
Copy link
Member Author

Yes, linear coordinates are equally spaced and the end coordinates are half a cell width inside the ends. The ranges are therefore easy to calculate from the coordinates. But with logarithmic axes the maths is non-trivial so it is useful for the ranges to always be available.

@ianthomas23 ianthomas23 merged commit f7de271 into holoviz:main Apr 7, 2023
@ianthomas23 ianthomas23 deleted the 1157_return_range_bounds branch April 7, 2023 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose computed bounds after aggregation
2 participants