Simple document classifier (AKA spam filter) #106

MikeInnes · 2018-10-31T16:30:05Z

julia> using TextAnalysis: SpamFilter, fit!, predict

julia> m = SpamFilter([:ham, :spam]);

julia> fit!(m, "this is ham", :ham);

julia> fit!(m, "this is spam", :spam);

julia> predict(m, "is this spam?")
Dict{Symbol,Float64} with 2 entries:
  :spam => 0.666667
  :ham  => 0.333333

This is a very simple document classifier that's very easy to use. I think it would make a nice entry point for many into the Julia ML ecosystem, as it's widely useful and can be a starting point for any more complex models people want to try out.

zgornel · 2018-11-06T14:51:15Z

Hi,
Do you think it would make sense to make another package i.e. TextAnalysisModels.jl where this particular model and other standard NLP/ML models could reside ? The sentiment analysis model and the LSA/LDA would fit there as well.

In a way, such a package would provide a link between pure text processing and representation, embeddings APIs and ML packages. Hopefully, that would also spur a bit more work and research into providing more specific processing support for text modelling.

TextAnalysis seems to already be a large package (in terms of scope) and a tightly coupled modeling package repository would (in my opinion) be a welcomed addition, even if just for keeping individual package complexity to manageable levels for all users.

MikeInnes · 2018-11-06T15:14:44Z

What would be the difference between TextAnalysis and TextAnalysisModels? How does one decide (or figure out, as a user) what lives where?

I don't really see that this adds any real complexity; things can always be split out later if there's a compelling reason to, but until then it just seems like fragmentation that users have to deal with. Much better to have a central point where available functionality is clearly visible.

zgornel · 2018-11-06T16:18:09Z

So no. Thanks.

aviks · 2019-01-04T19:57:07Z

So on the question of splitting out the models, I've sorta changed my mind, and decided to do that #111

aviks · 2019-04-11T17:15:23Z

I've renamed this to NaiveBayesClassifier, and added a naive test. Could do with some documentation.

spam filter

a957a86

zgornel mentioned this pull request Jan 3, 2019

Merge back to TextAnalysis zgornel/StringAnalysis.jl#2

Closed

aviks added 2 commits April 11, 2019 17:51

Merge branch 'master' into mji/spam

d99dac3

Rename SpamFilter to NaiveBayesClassifier, added test

5f1b5c7

aviks merged commit 6c16bd6 into master Apr 11, 2019

Ayushk4 mentioned this pull request May 9, 2019

Documentation for Bayes.jl #151

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple document classifier (AKA spam filter) #106

Simple document classifier (AKA spam filter) #106

MikeInnes commented Oct 31, 2018

zgornel commented Nov 6, 2018

MikeInnes commented Nov 6, 2018

zgornel commented Nov 6, 2018

aviks commented Jan 4, 2019

aviks commented Apr 11, 2019

Simple document classifier (AKA spam filter) #106

Simple document classifier (AKA spam filter) #106

Conversation

MikeInnes commented Oct 31, 2018

zgornel commented Nov 6, 2018

MikeInnes commented Nov 6, 2018

zgornel commented Nov 6, 2018

aviks commented Jan 4, 2019

aviks commented Apr 11, 2019