Web-based shopping preference has created a huge demand for E-commerce websites nowadays. In this project we have focussed to create a database by scraping data from Amazon e-commerce website.The design of the datatbase is built in way which can help e-commerce optimization to a great deal. The design of the database focuses on enabling such big queries which can contribute to key business aspects of conversion-rate and revenue growth. The data has been structured in scuh a way that optimization startegies such as how to enhance average order values, what bundle of products go well together for sale etc can be easily queried from the database. These factors specially in regard of German e-commerce market is important.The average order value (AOV) for Germany is 138 euros per order online which is higher than the global average but at the same time the average conversion rate in Germany is just 2.22%. E-commerce market is showing robust growth trends in coming years, so it may be possible to look for more oppurtunities to optimize e-commerce in Germany. I hope this project may help in this.
Show/Hide
- dslr.csv
- headphones.csv
- keyboard.csv
- monitor.csv
- mouse.csv
- product_output.jsonl
- product_summary.csv
- search_output.jsonl
Show/Hide
- Scraping the data
- Imports
- Urls are scraped using create_search_url.py
- Yaml file created through CSS elements
- Search. py deployed to fetch Product summary
- Json file dumped to form search_output.json
- Product urls extracted using urlib
- Product. py deployed to fetch product details
- Json file dumped to form product_output.json
Show/Hide
- Search_output.json validated
- Imports
- Duplicates removed
- Nulls checked
- Feature extraction done
- Data saved as csv for upload
Show/Hide
- Extraction: Csv file fetched in VScode(editor)
- Transformation: Following specified
- Field Terminator
- Line Termination
- Field Demilitation
- Encloser and Ignore
- Loading :
- transformed file is placed in enviornment variable defined path
- my.ini file can also be changed to put desired path
- Data Warehouse tables are designed in a de-normalized structure. (In normalization database the same column data cannot be repeated or in simple words there will not be any redundant data.)
- Load Infile Sql command done to upload data.