Skip to content

DeepakMishraDA/AmazonDatabase-design-products-E-commerce

Repository files navigation

Amazon E-Commerce Database Creation

OVERVIEW


Web-based shopping preference has created a huge demand for E-commerce websites nowadays. In this project we have focussed to create a database by scraping data from Amazon e-commerce website.The design of the datatbase is built in way which can help e-commerce optimization to a great deal. The design of the database focuses on enabling such big queries which can contribute to key business aspects of conversion-rate and revenue growth. The data has been structured in scuh a way that optimization startegies such as how to enhance average order values, what bundle of products go well together for sale etc can be easily queried from the database. These factors specially in regard of German e-commerce market is important.The average order value (AOV) for Germany is 138 euros per order online which is higher than the global average but at the same time the average conversion rate in Germany is just 2.22%. E-commerce market is showing robust growth trends in coming years, so it may be possible to look for more oppurtunities to optimize e-commerce in Germany. I hope this project may help in this.

Table of Contents:

Show/Hide
  1. Data
  2. Tools
  3. Scrape
  4. Data-process
  5. SQL
  6. Time-series analysis
  7. Tableau Dashboard

Data:

Show/Hide
  • dslr.csv
  • headphones.csv
  • keyboard.csv
  • monitor.csv
  • mouse.csv
  • product_output.jsonl
  • product_summary.csv
  • search_output.jsonl

Tools:

Show/Hide
  • Python
  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn
  • Requests
  • Selectorlib
  • Fake_useragent
  • Time
  • Plotly

Scrape

Show/Hide
  1. Scraping the data
    • Imports
    • Urls are scraped using create_search_url.py
    • Yaml file created through CSS elements
    • Search. py deployed to fetch Product summary
    • Json file dumped to form search_output.json
    • Product urls extracted using urlib
    • Product. py deployed to fetch product details
    • Json file dumped to form product_output.json

Data-process

Show/Hide
  1. Search_output.json validated
  2. Imports
  3. Duplicates removed
  4. Nulls checked
  5. Feature extraction done
  6. Data saved as csv for upload

SQL

Show/Hide

ETL Process

  1. Extraction: Csv file fetched in VScode(editor)
  2. Transformation: Following specified
    • Field Terminator
    • Line Termination
    • Field Demilitation
    • Encloser and Ignore
  3. Loading :
    • transformed file is placed in enviornment variable defined path
    • my.ini file can also be changed to put desired path
    • Data Warehouse tables are designed in a de-normalized structure. (In normalization database the same column data cannot be repeated or in simple words there will not be any redundant data.)
    • Load Infile Sql command done to upload data.

Time-series analysis

Show/Hide

Tableau Dashboard

Show/Hide

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published