This is a machine learning project that focuses on predicting hotel ratings based on various features and reviews. The goal is to build and evaluate different models to accurately predict the rating given by reviewers.
- Project Overview
- Data Preprocessing
- Model Training and Evaluation
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- K-Nearest Neighbors Classifier
- Regression Models
The project aims to predict hotel ratings based on a dataset containing various features and reviews. The dataset is preprocessed to handle missing values, encode categorical variables, and perform feature scaling. The project includes both classification and regression models to cover different aspects of rating prediction.
The data preprocessing steps involve handling outliers, encoding categorical variables, extracting relevant information from tags, and scaling numerical features. Outliers in certain columns are winsorized to mitigate their impact on the models. Categorical variables are encoded, and tags are extracted and organized into categories. Numerical features are scaled using the Min-Max scaling technique to ensure consistency across different ranges.
A logistic regression model is trained to predict hotel ratings. The top features with the highest correlation to the target variable are selected for training the model. The model's performance is evaluated using validation and test accuracy scores. The trained logistic regression model is saved for future use.
A decision tree classifier is trained to predict hotel ratings. The same top features selected for logistic regression are used as input features. The model's performance is evaluated using validation and test accuracy scores. The trained decision tree classifier is saved for future use.
A random forest classifier is trained to predict hotel ratings. The model uses 150 estimators, a maximum depth of 25, and minimum samples per leaf of 75. The model's performance is evaluated using validation and test accuracy scores. The trained random forest classifier is saved for future use.
A k-nearest neighbors classifier is trained to predict hotel ratings. The model uses 21 neighbors for classification. The model's performance is evaluated using validation and test accuracy scores. The trained k-nearest neighbors classifier is saved for future use.
A linear regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.
A k-nearest neighbors regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.
A random forest regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.
A gradient boosting regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.