Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for TMS tileset generation #886

Merged
merged 2 commits into from
Apr 6, 2020
Merged

Performance improvement for TMS tileset generation #886

merged 2 commits into from
Apr 6, 2020

Conversation

hokieg3n1us
Copy link
Contributor

@hokieg3n1us hokieg3n1us commented Mar 9, 2020

I've been utilizing the TMS tileset rendering feature for a couple weeks, after seeing Tom White's article, using his branch that was recently merged. I'd been attempting to render the OSM 1 billion point dataset to a TMS tileset for demonstration purposes, but repeatedly failing on my machine (Xeon E3-1505M v6, 64 GB Ram). I investigated and identified some performance improvements that I've included in this pull request:

  • Remove duplicate calls to gen_super_tiles
  • Cache results of rasterize_func instead of running twice, during both calculate_zoom_level_stats and render_super_tile
  • Moved calculate_zoom_level_stats to a serial implementation instead of dask.bag.map to reduce memory overhead that caused large renders to fail

Performance Comparison of Tileset Rendering (50 Million OSM GPS Points)

Using this branch, I was able to successfully render the OSM 1 billion point dataset. The elapsed times for each level are below.

Level Super Tile Count Calculating Stats (s) Rendering (s) Total Time (s)
0 1 46.70 16.39 63.08
1 1 42.98 9.47 52.45
2 1 43.79 8.82 52.61
3 1 41.64 9.68 51.32
4 1 41.99 11.97 53.97
5 4 65.52 25.96 91.48
6 16 155.82 81.61 237.43
7 64 515.14 314.10 829.25
8 256 2130.61 737.57 2868.18
9 1024 8646.66 1456.87 10103.54

@brendancol
Copy link
Collaborator

@hokieg3n1us Great PR and thanks for the benchmarks. Looks like we just need to get CI passing then we can merge.

As a side note, I've forked tiles.py into xarray-spatial along with a bunch of the analysis tools from the spatial and geo modules. The plan is to continue development of analysis related tools there and have datashader focus on the rasterization pipeline. @jbednar can comment more on this.

@jbednar I think we should merge (once CI passes) and then I will patch https://github.com/makepath/xarray-spatial/blob/master/xrspatial/tiles.py

@hokieg3n1us
Copy link
Contributor Author

@brendancol @jbednar

I dug into the CI failures, starting by examining the version differences being pulled in by the environment.yml between the last successful build and the build associated with this PR, since I couldn't see any code change to Datashader that'd link to the failures we're seeing.

I believe I've narrowed the cause to changes to Matplotlib present in this PR.

I'm not sure what the proper solution is, but I can confirm that by manually setting my version of Matplotlib to 3.1.3 instead of 3.2.0, the tests passed successfully.

@philippjfr
Copy link
Member

I think we should merge and deal with the CI failures separately. I'll push up a PR for those tomorrow.

@jbednar jbednar merged commit 7ee695c into holoviz:master Apr 6, 2020
@jbednar
Copy link
Member

jbednar commented Apr 6, 2020

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants