Skip to content

A small repo to generate a map of South Africa's streets colour coded by the name's origin.

License

Notifications You must be signed in to change notification settings

Emily-RoseSteyn/south-africa-street-history-mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

South Africa Street History Mapping

We were curious about visualising how street names correlate to their language or place of origin in South Africa - a country whose history is marked by significant power struggles and complex race relations. This repo provides the code for creating maps of street networks colour coded by place of origin and language.

This readme is divided into:

Results

Area Dictionary Lookup Language Detector
Johannesburg joburg joburg-lang
Soweto soweto soweto-lang
Sandton sandton sandton-lang
Cape Town cape-town cape-town-lang

Academic Outputs

Running the code

This section describes how to run the code. Feel free to open an issue if you have any questions!

Prerequisites

  • If windows, git bash
  • Docker
  • Poetry
  • ~5GB Disk Space (Docker images + data)

Setup

Poetry is used to manage packages and virtual environments.

 poetry shell
 poetry install

Data Download Pipeline

1. Retrieve streets for relevant countries

Core Code: download_country_streets.py

We first need to download all street names for South Africa and selected countries that have played a role in South Africa's history (see countries). Data is downloaded using the Overpass API - an API that retrieves data easily from OpenStreetMaps.

To retrieve street data, check that you're happy with what countries are being retrieved and run:

python ./src/street_list_download/main.py

If you're on a slurm enabled cluster, you can run

sbatch ./scripts/1_retrieve-streets.sbatch

The outputs of this script are saved to streets in CSV format.

2. Process street data

Core Code: preprocess_country_streets.py

We now process the street names for the various countries so that we end up with a dictionary of terms for the country. Each street name is:

  • Exploded by space (e.g. so that Nottingham Road becomes [Nottingham, Road])
  • Converted to lowercase

This results in a dataframe of terms. Empty, NaN, digit, and duplicate terms are dropped. Words less than a certain length are also dropped.

To process the street data, run:

python ./src/street_list_preprocessing/main.py

If you're on a slurm enabled cluster, you can run

sbatch ./scripts/2_process-streets.sbatch

The outputs of this script are saved to streets in CSV format with the prefix "processed". Additionally, all terms and the corresponding origin country are saved to a sqlite database in output/street_history.sqlite in the table street_terms.

3. Build Dictionary

Core Code: build_dictionary_for_term.py

Now that we have all the terms for each selected country, we can build a lookup dictionary for each term for a "home" country. In our case, South Africa is the home country.

For each term in South Africa's terms data from the previous step, the term is looked up in the street_terms table. If the term is matched to one or more countries (including in the home country), the term is saved in a dictionary table and assigned a likelihood based on the frequency of the term appearing in different countries.

The term, origin, and likelihood are saved to a sqlite database in output/street_history.sqlite in a table with the format <country>_terms_dictionary.

To build a dictionary of terms for a specific country, run:

python ./src/dictionary_builder/main.py $COUNTRY

Where $COUNTRY is south_africa in the case of this repo but could be modified to other countries that have been downloaded.

If you're on a slurm enabled cluster, you can run:

sbatch ./scripts/3_build-dictionary-south-africa.sbatch

4. Map

Finally, we can now map street names for a particular area in the "home" country. To do this, OSMNX is used to retrieve a street network graph for an area. The street names in the network are preprocessed to produce terms for each name. The terms are looked up in the dictionary and the term with the highest likelihood origin is used to set the origin (excluding "stop" words like road, avenue, etc). The street is then mapped with a colour coding matching the allocated origin.

Additionally, an option is included to instead map the streets by language which needs some further work but produces interesting results. This second mapping uses lingua to detect the language of the terms provided.

To map all street names in a region, run the end-to-end mapping - e.g.:

python ./src/mapping/map-e2e.py "Johannesburg, South Africa" --distance 30000 --fig_size 64

Helpful Notebooks

There are a bunch of Jupyter notebooks in the notebooks folder which may be useful for you to play around with.

Contact

Feel free to reach out to me either via this repo or [email protected].

About

A small repo to generate a map of South Africa's streets colour coded by the name's origin.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published