Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CSV and other formats with data source preprocess option #635

Merged
merged 2 commits into from
Mar 2, 2018

Conversation

bcamper
Copy link
Member

@bcamper bcamper commented Feb 21, 2018

To enable support for formats that may contain geographic data, but not be formatted in the commonly supported formats (GeoJSON, MVT, etc.), this PR adds a data source preprocess option. This parameter can define a JS function to mutate the raw network response fetched by a data source, before it is processed further. This is similar to the existing transform function, with the difference that transform is called after initial response parsing (e.g. decoding TopoJSON and MVT to GeoJSON-style data).

The preprocess function should return data in the format expected by the data source type, e.g. if type: GeoJSON, then preprocess should return a GeoJSON object.

For example, given a Who's On First CSV export (which contains geographic data in the form of census interior points) formatted like this:

name_eng_x_variant,uscensus_aland,uscensus_awater,uscensus_cd115fp,uscensus_cdsessn,uscensus_funcstat,uscensus_geoid,uscensus_intptlat,uscensus_intptlon,uscensus_lsad,uscensus_lsy,uscensus_mtfcc,uscensus_namelsad,uscensus_sldlst,uscensus_sldust,uscensus_statefp,wof_abbreviation,wof_association,wof_belongsto,wof_breaches,wof_categories,wof_concordances,wof_concordances_sources,wof_constituency,wof_country,wof_created,wof_geomhash,wof_hierarchy,wof_id,wof_lastmodified,wof_name,wof_parent_id,wof_path,wof_placetype,wof_placetype_id,wof_placetype_names,wof_repo,wof_subdivision,wof_superseded_by,wof_supersedes,wof_tags
,8016461395.0,408281421.0,null,null,N,36115,+44.6404010,-074.1531364,L3,2016,G5220,Assembly District 115,115,null,36,null,state-house,"[102191575,85633793,85688543]",[],[],null,[],region,us,1481487733,6cdeb8e408b110a903cfa09f169127fc,"[{""continent_id"":102191575,""country_id"":85633793,""region_id"":85688543}]",1108771377,1481487733,New York Assembly District 115,85688543,110/877/137/7/1108771377.geojson,constituency,1108746739,[],whosonfirst-data-constituency-us-ny,null,[],[],[]
,2096846209.0,18904910.0,null,null,N,36124,+42.1485630,-076.4279918,L3,2016,G5220,Assembly District 124,124,null,36,null,state-house,"[102191575,85633793,85688543]",[],[],null,[],region,us,1481487736,62c624e9313c3aced2ad496c1a616d35,"[{""continent_id"":102191575,""country_id"":85633793,""region_id"":85688543}]",1108771397,1481487736,New York Assembly District 124,85688543,110/877/139/7/1108771397.geojson,constituency,1108746739,[],whosonfirst-data-constituency-us-ny,null,[],[],[]
,1227673529.0,781123966.0,null,null,N,36055,+43.1085124,-077.4895172,LU,2016,G5210,State Senate District 55,null,055,36,null,state-senate,"[102191575,85633793,85688543]",[],[],null,[],region,us,1481487766,1e43fde4d5e7da384bd94e3b3007b975,"[{""continent_id"":102191575,""country_id"":85633793,""region_id"":85688543}]",1108771579,1481487766,New York State Senate District 55,85688543,110/877/157/9/1108771579.geojson,constituency,1108746739,[],whosonfirst-data-constituency-us-ny,null,[],[],[]

We can use the preprocess function to remap those columns into a GeoJSON collection of points:

    csv:
        type: GeoJSON
        url: ny-us-constituencies.csv
        preprocess: |
            function (data) {
                var rows = data.split('\n');
                return {
                    type: 'FeatureCollection',
                    features: rows.map(function(row) {
                        row = row.split(',');
                        return {
                            type 'Feature',
                            geometry: {
                                type: 'Point',
                                coordinates: [row[8], row[7]] // census intptlon & intptlat fields
                            },
                            properties: {
                                name: row[12] // district name
                            }
                        }
                    })
                };
            }

Additional notes:

  • preprocess has access to the same externally loaded scripts as the transform function.
  • preprocess can return an immediate response, or a Promise which resolves with a response, allowing for the potential use of async operations (while the initial tests are with CSV files, theoretically the functionality could be applied to shapefiles and other format parsing as well).
  • While it is possible to combine both preprocess and transform functions in a single data source (if both are defined, both will execute), in practice there are probably few cases where it would be useful.

(the `transform` function applies to the geometry after it is parsed)
`preprocess` can be used to display non-GeoJSON network responses, like CSVs,
or arbitrary APIs that include geo data (such as a lat/lng) but not in GeoJSON format,
by reformatting them into GeoJSON.
`preprocess` receives a raw string (that it must parse with JSON.parse() if the response is a JSON string),
and can return either a GeoJSON object, or a string that is parseable as such.
…romise

(useful for incorporating libraries that have async operations)
@bcamper bcamper added this to the v0.15.0 milestone Feb 21, 2018
@bcamper
Copy link
Member Author

bcamper commented Feb 21, 2018

@matteblair this is almost assuredly a JS-only feature like scripts and transform for the medium-future, but you may have thoughts on it nonetheless.

Also cc @meetar @burritojustice @thisisaaronland

@bcamper bcamper merged commit 5c9c5c7 into master Mar 2, 2018
@bcamper bcamper deleted the source-preprocess branch March 2, 2018 02:52
@matteblair
Copy link
Member

This would be a fair bit of work to implement in Tangram ES, though not impossible. We could also pursue a data source model that allows users to implement on-the-fly transformations like this in the native language for their platform. Let's see how it shakes out in JS first!

@bcamper
Copy link
Member Author

bcamper commented Mar 5, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants