On 07/19/2020, GeoSpark has been accepted to the Apache Software Foundation under the new name Apache Sedona (incubating). The code in this repository will be imported to the ASF Git repository. Old contributors please read this GitHub issue and submit your CLA at your earliest convenience.
Stable | Latest | Source code |
---|---|---|
GeoSpark@Twitter || GeoSpark Discussion Board ||
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.
Name | API | Spark compatibility | Introduction |
---|---|---|---|
Core | RDD | Spark 2.X/1.X | SpatialRDDs and Query Operators. |
SQL | SQL/DataFrame | SparkSQL 2.1+ | SQL interfaces for GeoSpark core. |
Viz | RDD, SQL/DataFrame | RDD - Spark 2.X/1.X, SQL - Spark 2.1+ | Visualization for Spatial RDD and DataFrame. |
Zeppelin | Apache Zeppelin | Spark 2.1+, Zeppelin 0.8.1+ | GeoSpark plugin for Apache Zeppelin |
Please visit GeoSpark website for detailed documentations
- GeoSpark 1.3.1 is released. This version provides a complete Python wrapper to GeoSpark RDD and SQL API. It also contains a number of bug fixes and new functions from 12 contributors. See Python tutorial: RDD, Python tutorial: SQL, Release note
- (Mo)hamed Sarwat (Twitter: @MoSarwat)
- Jia Yu
GeoSpark ecosystem has around 10K downloads per month.