Skip to content

Latest commit

 

History

History
 
 

Duplicate_Question_pair

DUPLICATE QUESTION PAIR

A NLP project to find weather given 2 questions are same are not semantically speaking.

Dataset Link - https://www.kaggle.com/c/quora-question-pairs

In This Project we Have 3 Jupyter Notebook , In First i have simple Random Forest Algorithm which gives me an accuracy of 75% , In second Jupyter Notebook I have tried to increase the accuracy by adding 7 new Feature , such as Length of each question , no of character in the Question , Common word , no of Character in one word . and have used Random forest from this I got accuracy of 80% and in Last Jupyter Notebook NLP concept is used which gives an accuracy of 90%

Built a StreamLit application to demonstrate the project . In this Application there are 2 rows that is for entering Question 1 and Question 2 , then click on predict to konow wheather it is same or duplicate .

So For Running The Project Run Python Notebook first , data set Train.csv is already provided and then install the following dependency provided in Requirement Section to run The application and get the desired Result .

As The data set is Too Long about 400005 so Be paitent while running python notebook .

For Running The application on Your Local Host Run : StreamLit run app.py . Furture You can deploy it to heroku , it is simple to do you will get the commands on heroku when you go there to deploy , Simply run that command your project will be deployed .