Skip to content

lunary403/Hotel-Rating-Prediction-ML

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hotel Rating Prediction

This is a machine learning project that focuses on predicting hotel ratings based on various features and reviews. The goal is to build and evaluate different models to accurately predict the rating given by reviewers.

Table of Contents

Project Overview

The project aims to predict hotel ratings based on a dataset containing various features and reviews. The dataset is preprocessed to handle missing values, encode categorical variables, and perform feature scaling. The project includes both classification and regression models to cover different aspects of rating prediction.

Data Preprocessing

The data preprocessing steps involve handling outliers, encoding categorical variables, extracting relevant information from tags, and scaling numerical features. Outliers in certain columns are winsorized to mitigate their impact on the models. Categorical variables are encoded, and tags are extracted and organized into categories. Numerical features are scaled using the Min-Max scaling technique to ensure consistency across different ranges.

Model Training and Evaluation

Logistic Regression

A logistic regression model is trained to predict hotel ratings. The top features with the highest correlation to the target variable are selected for training the model. The model's performance is evaluated using validation and test accuracy scores. The trained logistic regression model is saved for future use.

Decision Tree Classifier

A decision tree classifier is trained to predict hotel ratings. The same top features selected for logistic regression are used as input features. The model's performance is evaluated using validation and test accuracy scores. The trained decision tree classifier is saved for future use.

Random Forest Classifier

A random forest classifier is trained to predict hotel ratings. The model uses 150 estimators, a maximum depth of 25, and minimum samples per leaf of 75. The model's performance is evaluated using validation and test accuracy scores. The trained random forest classifier is saved for future use.

K-Nearest Neighbors Classifier

A k-nearest neighbors classifier is trained to predict hotel ratings. The model uses 21 neighbors for classification. The model's performance is evaluated using validation and test accuracy scores. The trained k-nearest neighbors classifier is saved for future use.

Regression Models

Linear Regression

A linear regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.

K-Nearest Neighbors Regression

A k-nearest neighbors regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.

Random Forest Regression

A random forest regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.

Gradient Boosting Regression

A gradient boosting regression model is trained to predict hotel ratings. The mean squared error (MSE) is calculated for both the training and validation sets to evaluate the model's performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.6%
  • Python 3.4%