Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average Perceptron POS Tagger (Issue #2) #131

Merged
merged 30 commits into from
Jun 15, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7ab41c2
Average Perceptron POS Tagger (Issue #2)
ComputerMaestro Mar 12, 2019
a0f5de6
Changes made to use BSON
ComputerMaestro Mar 13, 2019
d159778
Deleted averagePerceptron.jl
ComputerMaestro Mar 13, 2019
812f991
Deleted tagger.jl
ComputerMaestro Mar 13, 2019
efd2084
Average perceptron full working version
ComputerMaestro Mar 13, 2019
9ee8ed6
Improved averagePerceptronTagger.jl file
ComputerMaestro Mar 14, 2019
683ef1a
Resolving issue not retrieving set for loading pretained model
ComputerMaestro Mar 17, 2019
e0101c5
Update `averagePerceptronTagger.jl` file
ComputerMaestro Mar 18, 2019
3e631e7
added averagePerceptronTagger to TextAnalysis.jl
ComputerMaestro Apr 17, 2019
4f485e6
Merge branch 'master' of https://github.com/JuliaText/TextAnalysis.jl
ComputerMaestro Apr 17, 2019
ccee46c
Merge branch 'master' of https://github.com/JuliaText/TextAnalysis.jl
ComputerMaestro Apr 26, 2019
4b01c2b
bug fix
ComputerMaestro Apr 29, 2019
9038d25
bug fix for Julia 0.7.0
ComputerMaestro Apr 29, 2019
7a3b2e2
Some Tests added to Perceptron Tagger
ComputerMaestro May 1, 2019
546a128
docstrings shifted
ComputerMaestro May 10, 2019
cee1d40
bug fix
ComputerMaestro May 15, 2019
88889b8
Merge branch 'master' of https://github.com/JuliaText/TextAnalysis.jl
ComputerMaestro May 16, 2019
d78d6ba
Docs added
ComputerMaestro May 16, 2019
391183e
doc changes
ComputerMaestro May 16, 2019
1527762
Changes in docs
ComputerMaestro May 17, 2019
8eafc29
Changed function names
ComputerMaestro May 17, 2019
512e990
removed pos from `preprocessing.jl`
ComputerMaestro May 18, 2019
107b0da
merge
ComputerMaestro Jun 4, 2019
5aab551
bug fix
ComputerMaestro Jun 4, 2019
23ab387
removed 0.7 from travis
ComputerMaestro Jun 4, 2019
55fda7b
0.7 added
ComputerMaestro Jun 4, 2019
39b7674
copied preprocessing.jl
ComputerMaestro Jun 4, 2019
a220382
Merge branch 'master' of https://github.com/JuliaText/TextAnalysis.jl
ComputerMaestro Jun 15, 2019
af94ab9
adding POS tagging to `prepare!`
ComputerMaestro Jun 15, 2019
94b27e8
adding deprecation for tag_pos!
ComputerMaestro Jun 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ jobs:
- julia --project=docs/ docs/make.jl
after_success: skip
after_success:
- julia -e 'cd(Pkg.dir("TextAnalysis")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())';
- julia -e 'cd(Pkg.dir("TextAnalysis")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())';
1 change: 1 addition & 0 deletions REQUIRE
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ WordTokenizers
Flux
BSON
JSON
DataStructures
48 changes: 48 additions & 0 deletions docs/src/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,3 +225,51 @@ julia> summarize(s, ns=2)
"Assume this Short Document as an example."
"This has too foo sentences."
```

## Parts of Speech Tagger

This tagger can be used to find the POS tag of a word or token in a given sentence. It is a based on `Average Perceptron Algorithm`.
The model can be trained from scratch and weights are saved in specified location.
The pretrained model can also be loaded and can be used directly to predict tags.

### To train model:
```julia
julia> tagger = PerceptronTagger(false) #we can use tagger = PerceptronTagger()
julia> fit!(tagger, [[("today","NN"),("is","VBZ"),("good","JJ"),("day","NN")]])
iteration : 1
iteration : 2
iteration : 3
iteration : 4
iteration : 5
```

### To load pretrained model:
```julia
julia> tagger = PerceptronTagger(true)
loaded successfully
PerceptronTagger(AveragePerceptron(Set(Any["JJS", "NNP_VBZ", "NN_NNS", "CC", "NNP_NNS", "EX", "NNP_TO", "VBD_DT", "LS", ("Council", "NNP") … "NNPS", "NNP_LS", "VB", "NNS_NN", "NNP_SYM", "VBZ", "VBZ_JJ", "UH", "SYM", "NNP_NN", "CD"]), Dict{Any,Any}("i+2 word wetlands"=>Dict{Any,Any}("NNS"=>0.0,"JJ"=>0.0,"NN"=>0.0),"i-1 tag+i word NNP basic"=>Dict{Any,Any}("JJ"=>0.0,"IN"=>0.0),"i-1 tag+i word DT chloride"=>Dict{Any,Any}("JJ"=>0.0,"NN"=>0.0),"i-1 tag+i word NN choo"=>Dict{Any,Any}("NNP"=>0.0,"NN"=>0.0),"i+1 word antarctica"=>Dict{Any,Any}("FW"=>0.0,"NN"=>0.0),"i-1 tag+i word -START- appendix"=>Dict{Any,Any}("NNP"=>0.0,"NNPS"=>0.0,"NN"=>0.0),"i-1 word wahoo"=>Dict{Any,Any}("JJ"=>0.0,"VBD"=>0.0),"i-1 tag+i word DT children's"=>Dict{Any,Any}("NNS"=>0.0,"NN"=>0.0),"i word dnipropetrovsk"=>Dict{Any,Any}("NNP"=>0.003,"NN"=>-0.003),"i suffix hla"=>Dict{Any,Any}("JJ"=>0.0,"NN"=>0.0)…), DefaultDict{Any,Any,Int64}(), DefaultDict{Any,Any,Int64}(), 1, ["-START-", "-START2-"]), Dict{Any,Any}("is"=>"VBZ","at"=>"IN","a"=>"DT","and"=>"CC","for"=>"IN","by"=>"IN","Retrieved"=>"VBN","was"=>"VBD","He"=>"PRP","in"=>"IN"…), Set(Any["JJS", "NNP_VBZ", "NN_NNS", "CC", "NNP_NNS", "EX", "NNP_TO", "VBD_DT", "LS", ("Council", "NNP") … "NNPS", "NNP_LS", "VB", "NNS_NN", "NNP_SYM", "VBZ", "VBZ_JJ", "UH", "SYM", "NNP_NN", "CD"]), ["-START-", "-START2-"], ["-END-", "-END2-"], Any[])
```

### To predict tags:
```julia
julia> predict(tagger, ["today", "is"])
2-element Array{Any,1}:
("today", "NN")
("is", "VBZ")
```

`PerceptronTagger(load::Bool)`

* load = Boolean argument if `true` then pretrained model is loaded

`fit!(self::PerceptronTagger, sentences::Vector{Vector{Tuple{String, String}}}, save_loc::String, nr_iter::Integer)`

* self = `PerceptronTagger` object
* sentences = `Vector` of `Vector` of `Tuple` of pair of word or token and its POS tag [see above example]
* save_loc = location of file to save the trained weights
* nr_iter = Number of iterations to pass the `sentences` to train the model ( default 5)

`predict(self::PerceptronTagger, tokens)`

* self = PerceptronTagger
* tokens = `Vector` of words or tokens for which to predict tags
2 changes: 2 additions & 0 deletions src/TextAnalysis.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ module TextAnalysis
export SentimentAnalyzer
export jackknife_avg, listify_ngrams, weighted_lcs, fmeasure_lcs
export rouge_l_summary, rouge_l_sentence, rouge_n
export PerceptronTagger, fit!, predict

include("tokenizer.jl")
include("ngramizer.jl")
Expand Down Expand Up @@ -79,4 +80,5 @@ module TextAnalysis
include("deprecations.jl")
include("utils.jl")
include("rouge.jl")
include("averagePerceptronTagger.jl")
end
Loading