-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponseDB.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponseDB.db models/RandomForestClassifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to localhost:3001
This project was implemented as part of a Data Science course to put into practice ETL and ML pipelines. Implementing it allowed me to understand and practice Data engineering skills, the basics of NLP and their application to the implementation of an ML model through pipelines.
Data directory: contains data from Figure8 and the script process_data.py containing the ETL pipeline to clean data. Models directory: contains the script train_classifier.py containing ML pipeline based on NLP to train and evaluate ML model. App directory: contains the script run.py used to run the flask web-app which visualize the data and deploy the classifier so it can be used.
The purpose of the project is to classify disaster messages. A user can input a message into the web-app, which highlights the categories to which the message corresponds. The app runs an ML model, deployed into the web-app in order to classify new messages. This model was trained on data provided by Udacity and manipulated by an ETL pipeline and it implements NLP.
Udacity provided the course material (both data and the skeleton of the web app)