- phantomjs
- python
Run following script:
phantomjs download_shop.js [PREFIX]
#example: phantomjs download_shop.js www.grainger.com)
Run following script:
python labelling.py [prefix]
Script creates new directory "labeled_dom_trees" which contains copy of DOM trees with labeled elements.
We review labeled results by checking image patches of labeled elements. The process is divided into 3 steps - prepare labeled patches, review them, remove them.
python review.py prepare [prefix]
python review.py review [prefix]
You can select wrongly labeled patches, in order to remove page from dataset. If everything goes right, the script creates new file in "page_sets" directory, which contains all pages that passed the review process.
python review.py remove [prefix]
python create_net_inputs.py [prefix]