Projects/predicting_catalog_demand at master · snafis/Projects

History

Name		Name	Last commit message	Last commit date
parent directory ..
.gitattributes		.gitattributes
.gitignore		.gitignore
Predicting_Catalog_Demand.html		Predicting_Catalog_Demand.html
Predicting_Catalog_Demand.ipynb		Predicting_Catalog_Demand.ipynb
README.md		README.md
p1-customers.xlsx		p1-customers.xlsx
p1-mailinglist.xlsx		p1-mailinglist.xlsx

README.md

Installation

Python 3.7.2
Libraries:
- Pandas
- matplotlib
- seaborn
- statsmodels

You will also need to have software installed to run and execute an iPython Notebook

Description

The business problem was formulated as follows:

You recently started working for a company that manufactures and sells high-end home goods. Last year the company sent out its first print catalog, and is preparing to send out this year's catalog in the coming months. The company has 250 new customers from their mailing list that they want to send the catalog to. Your manager has been asked to determine how much profit the company can expect from sending a catalog to these customers. You, the business analyst, are assigned to help your manager run the numbers. While fairly knowledgeable about data analysis, your manager is not very familiar with predictive models. You’ve been asked to predict the expected profit from these 250 new customers. Management does not want to send the catalog out to these new customers unless the expected profit contribution exceeds $10,000.

Data

p1-customers.xlsx - information on about 2,300 customers

p1-mailinglist.xlsx - data on 250 customers that you need to predict sales

File Descriptions

You can find the results of the analysis in either html form or complete Jupyter Notebook:

Alterinatively, run one the following commands in a terminal after navigating to the top-level project directory predicting_catalog_demand/ (that contains this README):

ipython notebook Predicting_Catalog_Demand.ipynb

or

jupyter notebook Predicting_Catalog_Demand.ipynb

This will open the iPython Notebook software and project file in your browser.

Results

To predict catalog demand, I performed the following steps:

Step 1: Preprocessing
- explored datasets
- split the data into target variable and features
- checked for linear relationship between the target and numerical features using sns.regplot()
- one-hot encoded categorical features using pd.get_dummies()
Step 2: Building Linear Regression Model
- checked for correlations between features to avoid multicollinearity using corr()
- initialized and fit linear regression model with sm.OLS() from statsmodels library
- transformed the new customer data in the same way as the training set
- scored the lm model on the new customer data, predicted the sales and calculated the possible profits

Acknowledgements

Having started learning Python, I decided to rewrite the project I first completed in Alteryx within the Predictive Analytics for Business Nanodegree at Udacity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predicting_catalog_demand

predicting_catalog_demand

README.md

Table of Contents

Installation

Description

Data

File Descriptions

Results

Acknowledgements

Files

predicting_catalog_demand

Directory actions

More options

Directory actions

More options

Latest commit

History

predicting_catalog_demand

Folders and files

parent directory

README.md

Table of Contents

Installation

Description

Data

File Descriptions

Results

Acknowledgements