Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple document classifier (AKA spam filter) #106

Merged
merged 3 commits into from
Apr 11, 2019
Merged

Simple document classifier (AKA spam filter) #106

merged 3 commits into from
Apr 11, 2019

Conversation

MikeInnes
Copy link
Contributor

julia> using TextAnalysis: SpamFilter, fit!, predict

julia> m = SpamFilter([:ham, :spam]);

julia> fit!(m, "this is ham", :ham);

julia> fit!(m, "this is spam", :spam);

julia> predict(m, "is this spam?")
Dict{Symbol,Float64} with 2 entries:
  :spam => 0.666667
  :ham  => 0.333333

This is a very simple document classifier that's very easy to use. I think it would make a nice entry point for many into the Julia ML ecosystem, as it's widely useful and can be a starting point for any more complex models people want to try out.

@zgornel
Copy link

zgornel commented Nov 6, 2018

Hi,
Do you think it would make sense to make another package i.e. TextAnalysisModels.jl where this particular model and other standard NLP/ML models could reside ? The sentiment analysis model and the LSA/LDA would fit there as well.

In a way, such a package would provide a link between pure text processing and representation, embeddings APIs and ML packages. Hopefully, that would also spur a bit more work and research into providing more specific processing support for text modelling.

TextAnalysis seems to already be a large package (in terms of scope) and a tightly coupled modeling package repository would (in my opinion) be a welcomed addition, even if just for keeping individual package complexity to manageable levels for all users.

@MikeInnes
Copy link
Contributor Author

What would be the difference between TextAnalysis and TextAnalysisModels? How does one decide (or figure out, as a user) what lives where?

I don't really see that this adds any real complexity; things can always be split out later if there's a compelling reason to, but until then it just seems like fragmentation that users have to deal with. Much better to have a central point where available functionality is clearly visible.

@zgornel
Copy link

zgornel commented Nov 6, 2018

So no. Thanks.

@aviks
Copy link
Member

aviks commented Jan 4, 2019

So on the question of splitting out the models, I've sorta changed my mind, and decided to do that #111

@aviks
Copy link
Member

aviks commented Apr 11, 2019

I've renamed this to NaiveBayesClassifier, and added a naive test. Could do with some documentation.

@aviks aviks merged commit 6c16bd6 into master Apr 11, 2019
@Ayushk4 Ayushk4 mentioned this pull request May 9, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants