Research papers

Interesting ideas:

we can use "co-training" (semi-supervised learning): make a dataset of labeled data + big chunk of unlabeled data. Train two classifiers with distinct features (e.g. review-oriented and user-oriented) on labeled data. Make them predict unlabeled data. Look at samples that are most certainly (judging by probability of both classifiers) paid/unpaid. Add such samples to labeled dataset. Rinse and repeat;
there were only three classifiers that proved to be performing: SVM, Naive Bayes, Logistic Regression. Other classifiers, ensembles and ideas (like Bagging and Boosting) were not researched (or maybe they proved to be weak?);
synthetic review spam is different from "real-world" review spam;
there are two main feature types: review-centric and reviewer-centric.

Has a comparison of previous work related to review spam classification: 3. Learning to Identify Review Spam

Provide feedback