Skip to content

Latest commit

 

History

History
82 lines (61 loc) · 3.77 KB

README.md

File metadata and controls

82 lines (61 loc) · 3.77 KB

App Performance Predictor

Predicting an app performance using Data Science.

About The Project

This is the mini project for SC1015 (Intro to Data Science and AI) which focuses on android market analysis and app perfomance predictor. For the entire walkthrough of the project, please view the notebooks/readmes in this order:

  1. Data Collection
  2. Data Cleaning
  3. Exploratory Data Analysis and Visualization
  4. Model Building
  5. Machine Learning

Project Folder Structure

Folder structure of our project

.
├── datasets                          # csv files
├── google-play-scrapper              # web scraper for data preparation
├── presentation                      # presentation ppt
├── streamlit-app                     # website for model simulation
├── data_cleaning.ipynb               # notebook for data cleaning
├── EDA.ipynb                         # notebook for eda
├── machine_learning.ipynb            # notebook for machine learning
├── model_building.ipynb              # notebook for model building
└── README.md

Problem Definition

  • How different features of an app affect its popularity?
  • Would an app exceed one million installs in a year after its release?

Models Used

  1. Decision Trees
  2. Random Forest Classifier

Key takeways from our primary EDA

  1. More than half of the apps have 100K-100M installs.
  2. The average rating of an app is 4.16
  3. Apps with more than 10M installs have a higher chance to be chosen as Editor's Choice.
  4. 99% of apps with more than 10M installs are free.
  5. 77% of apps with more than 10M installs are ad-supported and has in-app purchases.
  6. Apps with more than 10M installs are frequently updated, at most within a month.
  7. Size has a strong linear relationship with install count.

Streamlit Web App

For model simulation and visualization purposes, we have created a web app for this project. The link to the website is here.

Conclusion

  1. Oversampling the data did not improve the models’ performance on the test data because of overfitting.
  2. We are able to achieve over 80% accuracy in predicting if an app would exceed one million installs in a year.
  3. Strategies to increase the number of installs of an app may include:
    • Making the app free of charge
    • Increase the rating count of the app
    • Increase the review count of the app
  4. The number of installs of an app does not depends entirely on its feature. It might depend on qualitative features such as reviews. Hence one thing that we could improve upon is to do sentimental analysis on the reviews using NLP.

Contributors

  • @limivann - Web Scraper, Model Building / Machine Learning, Streamlit Web App
  • @lordAaron0121 - Data Cleaning, EDA and visualization
  • @serphyshio - Data Cleaning, EDA and visualization

References