Initial Dask trimesh support #696

jonmmease · 2019-01-24T14:18:18Z

Overview

This PR provides initial support for parallel aggregation of trimesh glyphs using dask

Note: This PR is based on #694 as it relies on some of the refactoring performed in that PR.

Usage

To take advantage of dask trimesh support, the datashader.utils.mesh utility function should be called with dask DataFrames for the the vertices and simplices arguments. In this case, the resulting mesh DataFrame will be a dask DataFrame rather than a pandas DataFrame.

When this dask mesh is passed into the cvs.trimesh functino, the trimesh aggregations will be performing in parallel. For example

# Dask mesh
verts_ddf = dd.from_pandas(verts, npartitions=4)
tris_ddf = dd.from_pandas(pd.concat([tris]*copies, axis=0), npartitions=4)
mesh_ddf = du.mesh(verts_ddf, tris_ddf).persist()

cvs = ds.Canvas(plot_height=900, plot_width=900)
agg = cvs.trimesh(verts_ddf, tris_ddf, mesh_ddf)

Implementation Notes

The job of the du.mesh function is to return a DataFrame containing the coordinates of every vertex in every triangle in the mesh in the proper winding order. A triangle is represented in this data structure by three rows, one for each vertex. If a single vertex is used by more then one triangle, then the coordinates of that vertex will show up in multiple rows in this DataFrame.

One important characteristic of the updated du.mesh function when called with a dask DataFrame is that it makes sure that no triangles straddle a partition boundary in the output. This amounts to making sure that the number of rows in each partition is a multiple of 3. The function attempts to build the output dask dataframe with the greater of the number of partitions in vertices and simplices, but the constraint to avoid breaking up triangles takes precedence over the number of partitions.

The speedup here is only in the call to cvs.trimesh, the call to du.mesh still requires pulling the vertices and simplices DataFrames into memory. Parallelizing this step in the calculation will take a bit more thought, and may require some spatial ordering of the input DataFrames.

Bechmarking

I ran some benchmark tests comparing the pandas aggregation with this new dask aggregation. These were run on a 2015 MBP with a quadcore processor. All of the dask tests were run using 4 partitions.

To scale the number of triangles I used the Chesapeake Bay mesh (https://github.com/pyviz/datashader/blob/master/examples/topics/bay_trimesh.ipynb), and then duplicated the simplices between 1 and 100 times. This scales from ~1 million to ~100 million triangles.

pandas vs dask runtime:

dask speedup factor (pandas runtime / dask runtime)

So the dask implementation is about 1.2 times faster on 1 million triangles and up the 4.3 times faster on 100 million triangles.

…k dataframe. Computing the mesh still requires bringing the entire vertices/simplices dataframes into memory, but the resulting mesh is now a Dask dataframe with partitions that are chosen intentionally to not cause triangles to straddle partitions.

jonmmease · 2019-02-01T00:10:33Z

datashader/glyphs.py

@@ -123,7 +123,8 @@ def __init__(self, x, y, z=None, weight_type=True, interp=True):

    @property
    def inputs(self):
-        return tuple([self.x, self.y] + list(self.z))
+        return (tuple([self.x, self.y] + list(self.z)) +
+                (self.weight_type, self.interpolate))


This change was needed because the inputs tuple is used by the parent class to implement hashing and equality, which are in tern used for memoization. Without this, I was seeing cases where repeated use of canvas.trimesh with different values for interpolate was not resulting in updated aggregation behavior.

jonmmease · 2019-02-01T00:12:30Z

datashader/utils.py

-    vals = vals.reshape(np.prod(vals.shape[:2]), vals.shape[2])
-    res = pd.DataFrame(vals, columns=vertices.columns)
+    # TODO: For dask: avoid .compute() calls
+    res = _pd_mesh(vertices.compute(), simplices.compute())


We were calling compute on both vertices and simplices anyway, so I opted to just call the pandas versions. In addition making this more concise, the pandas version has winding auto-detection enabled that was not previously enabled here.

jonmmease · 2019-02-01T00:13:27Z

@jbednar Ready for review.

jbednar

Looks like good progress without making it more complex; thanks!

jonmmease · 2019-02-06T14:37:34Z

@jbednar tests passing!

jonmmease force-pushed the enh_dask_trimesh branch from a602b90 to bf47918 Compare January 29, 2019 10:24

jonmmease mentioned this pull request Jan 31, 2019

Support for row-oriented coordinates in Canvas.line #694

Merged

2 tasks

jonmmease added 2 commits January 31, 2019 19:02

Add documentation note about parallelizing trimesh aggregation with Dask

ca5b977

jonmmease force-pushed the enh_dask_trimesh branch from bf47918 to ca5b977 Compare February 1, 2019 00:06

jonmmease commented Feb 1, 2019

View reviewed changes

jbednar approved these changes Feb 4, 2019

View reviewed changes

revert conda 4.6 change, let pyctdev handle it

42ba2c1

jonmmease merged commit 030fd10 into master Feb 7, 2019

jonmmease mentioned this pull request Feb 15, 2019

Dask trimesh support #679

Closed

philippjfr mentioned this pull request May 6, 2019

Roadmap: Task 6b: Optimize rendering of meshes and polygons holoviz-topics/EarthSim#85

Closed

maximlt deleted the enh_dask_trimesh branch December 25, 2021 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Dask trimesh support #696

Initial Dask trimesh support #696

jonmmease commented Jan 24, 2019 •

edited

Loading

jonmmease Feb 1, 2019

jonmmease Feb 1, 2019

jonmmease commented Feb 1, 2019

jbednar left a comment

jonmmease commented Feb 6, 2019

Initial Dask trimesh support #696

Initial Dask trimesh support #696

Conversation

jonmmease commented Jan 24, 2019 • edited Loading

Overview

Usage

Implementation Notes

Bechmarking

jonmmease Feb 1, 2019

Choose a reason for hiding this comment

jonmmease Feb 1, 2019

Choose a reason for hiding this comment

jonmmease commented Feb 1, 2019

jbednar left a comment

Choose a reason for hiding this comment

jonmmease commented Feb 6, 2019

jonmmease commented Jan 24, 2019 •

edited

Loading