Merge pull request #277 from rssdev10/fix/docs

documentation update
JuliaText · Oct 27, 2023 · cc7aac7 · cc7aac7
2 parents 77f1abb + 39d2e49
commit cc7aac7
Show file tree

Hide file tree

Showing 14 changed files with 250 additions and 575 deletions.
diff --git a/README.md b/README.md
@@ -43,5 +43,5 @@ Contributions, in the form of bug-reports, pull requests, additional documentati
 
 ## Support
 
-Feel free to ask for help on the [Julia Discourse forum](https://discourse.julialang.org/), or in the `#natural-language` channel on [julia-slack](https://julialang.slack.com). (Which you can [join here](https://slackinvite.julialang.org/)). You can also raise issues in this repository to request new features and/or improvements to the documentation and codebase.
+Feel free to ask for help on the [Julia Discourse forum](https://discourse.julialang.org/), or in the `#natural-language` channel on [julia-slack](https://julialang.slack.com). (Which you can [join here](https://julialang.org/slack/)). Or, [select what do you like here](https://julialang.org/community/). You can also raise issues in this repository to request new features and/or improvements to the documentation and codebase.
 
diff --git a/docs/make.jl b/docs/make.jl
@@ -19,3 +19,6 @@ makedocs(
     ],
 )
 
+deploydocs(;
+    repo="github.com/JuliaText/TextAnalysis.jl",
+)
diff --git a/docs/src/LM.md b/docs/src/LM.md
@@ -84,8 +84,8 @@ julia> masked_score = maskedscore(model,fit,"is","alien")
 
 used to evaluate the probability of word given context (*P(word | context)*)
 
-```julia
-score(m::gammamodel, temp_lm::DefaultDict, word::AbstractString, context::AbstractString)
+```@docs
+score
 ```
 
 Arguments:
@@ -100,91 +100,51 @@ Arguments:
 - In Interpolated language model, provide `Kneserney` and `WittenBell` smoothing 
 
 ### `maskedscore` 
+```@docs
+maskedscore
+```
 
-- It is used to evaluate *score* with masks out of vocabulary words
-
-- The arguments are the same as for `score`
-
-### `logscore` 
-
-- Evaluate the log score of this word in this context.
+### `logscore`
+```@docs
+logscore
+```
 
-- The arguments are the same as for `score` and `maskedscore`
 
 ### `entropy`
 
-```julia
-entropy(m::Langmodel,lm::DefaultDict,text_ngram::word::Vector{T}) where { T <: AbstractString}
+```@docs
+entropy
 ```
 
-- Calculate *cross-entropy* of model for given evaluation text.
-
-- Input text must be Array of ngram of same lengths
-
-### `perplexity`  
-
-- Calculates the perplexity of the given text.
-
-- This is simply 2 ** cross-entropy(`entropy`) for the text, so the arguments are the same as `entropy`.
+### `perplexity`
+```@docs
+perplexity
+```
 
 ##  Preprocessing
 
  For Preprocessing following functions:
-
-1. `everygram`: Return all possible ngrams generated from sequence of items, as an Array{String,1}
-
-```julia
-julia> seq = ["To","be","or","not"]
-julia> a = everygram(seq,min_len=1, max_len=-1)
- 10-element Array{Any,1}:
-  "or"
-  "not"
-  "To"
-  "be"
-  "or not" 
-  "be or"
-  "be or not"
-  "To be or"
-  "To be or not"
+```@docs
+everygram
+padding_ngram
 ```
 
-2. `padding_ngrams`: padding _ngram is used to pad both left and right of sentence and out putting ngrmas of order n
-
-   It also pad the original input Array of string 
-
-```julia
-julia> example = ["1","2","3","4","5"]
-julia> padding_ngrams(example,2,pad_left=true,pad_right=true)
- 6-element Array{Any,1}:
-  "<s> 1" 
-  "1 2"
-  "2 3"
-  "3 4"
-  "4 5"
-  "5 </s>"
-```
 ## Vocabulary 
 
 Struct to store Language models vocabulary
 
 checking membership and filters items by comparing their counts to a cutoff value
 
-It also Adds a special "unkown" tokens which unseen words are mapped to
-
-```julia
-julia> words = ["a", "c", "-", "d", "c", "a", "b", "r", "a", "c", "d"]
+It also Adds a special "unknown" tokens which unseen words are mapped to
 
-julia> vocabulary = Vocabulary(words, 2) 
- Vocabulary(Dict("<unk>"=>1,"c"=>3,"a"=>3,"d"=>2), 2, "<unk>") 
+```@repl
+using TextAnalysis
+words = ["a", "c", "-", "d", "c", "a", "b", "r", "a", "c", "d"]
+vocabulary = Vocabulary(words, 2) 
 
 # lookup a sequence or words in the vocabulary
-julia> word = ["a", "-", "d", "c", "a"]
-
-julia> lookup(vocabulary ,word)
- 5-element Array{Any,1}:
-  "a"
-  "<unk>"
-  "d"
-  "c"
-  "a"
+
+word = ["a", "-", "d", "c", "a"]
+
+lookup(vocabulary ,word)
 ```
diff --git a/docs/src/classify.md b/docs/src/classify.md
@@ -11,42 +11,25 @@ To load the Naive Bayes Classifier, use the following command -
 Its usage can be done in the following 3 steps.
 
 1- Create an instance of the Naive Bayes Classifier model -
-
-    model = NaiveBayesClassifier(dict, classes)
-
-
-It takes two arguments-
-
-* `classes`: An array of possible classes that the concerned data could belong to.
-* `dict`:(Optional Argument) An Array of possible tokens (words). This is automatically updated if a new token is detected in the Step 2) or 3)
-
+```@docs
+NaiveBayesClassifier
+```
 
 2- Fitting the model weights on input -
-
-    fit!(model, str, class)
-
+```@docs
+fit!
+```
 3- Predicting for the input case -
-
-    predict(model, str)
-
-## Example
-
-```julia
-julia> m = NaiveBayesClassifier([:legal, :financial])
-NaiveBayesClassifier{Symbol}(String[], Symbol[:legal, :financial], Array{Int64}(0,2))
+```@docs
+predict
 ```
 
-```julia
-julia> fit!(m, "this is financial doc", :financial)
-NaiveBayesClassifier{Symbol}(["financial", "this", "is", "doc"], Symbol[:legal, :financial], [1 2; 1 2; 1 2; 1 2])
-
-julia> fit!(m, "this is legal doc", :legal)
-NaiveBayesClassifier{Symbol}(["financial", "this", "is", "doc", "legal"], Symbol[:legal, :financial], [1 2; 2 2; … ; 2 2; 2 1])
-```
+## Example
 
-```julia
-julia> predict(m, "this should be predicted as a legal document")
-Dict{Symbol,Float64} with 2 entries:
-  :legal     => 0.666667
-  :financial => 0.333333
+```@repl
+using TextAnalysis
+m = NaiveBayesClassifier([:legal, :financial])
+fit!(m, "this is financial doc", :financial)
+fit!(m, "this is legal doc", :legal)
+predict(m, "this should be predicted as a legal document")
 ```
diff --git a/docs/src/corpus.md b/docs/src/corpus.md
@@ -3,76 +3,32 @@
 Working with isolated documents gets boring quickly. We typically want to
 work with a collection of documents. We represent collections of documents
 using the Corpus type:
-
-```julia
-julia> crps = Corpus([StringDocument("Document 1"),
-                      StringDocument("Document 2")])
-A Corpus with 2 documents:
- * 2 StringDocument's
- * 0 FileDocument's
- * 0 TokenDocument's
- * 0 NGramDocument's
-
-Corpus's lexicon contains 0 tokens
-Corpus's index contains 0 tokens
+```@docs
+Corpus
 ```
 
 ## Standardizing a Corpus
 
-A `Corpus` may contain many different types of documents:
-
-```julia
-julia> crps = Corpus([StringDocument("Document 1"),
-                          TokenDocument("Document 2"),
-                          NGramDocument("Document 3")])
-A Corpus with 3 documents:
- * 1 StringDocument's
- * 0 FileDocument's
- * 1 TokenDocument's
- * 1 NGramDocument's
-
-Corpus's lexicon contains 0 tokens
-Corpus's index contains 0 tokens
-```
-
-It is generally more convenient to standardize all of the documents in a
+A `Corpus` may contain many different types of documents. It is generally more convenient to standardize all of the documents in a
 corpus using a single type. This can be done using the `standardize!`
 function:
 
-```julia
-julia> standardize!(crps, NGramDocument)
-```
-
-After this step, you can check that the corpus only contains `NGramDocument`'s:
-
-```julia
-julia> crps
-A Corpus with 3 documents:
- * 0 StringDocument's
- * 0 FileDocument's
- * 0 TokenDocument's
- * 3 NGramDocument's
-
-Corpus's lexicon contains 0 tokens
-Corpus's index contains 0 tokens
+```@docs
+standardize!
 ```
 
 ## Processing a Corpus
 
 We can apply the same sort of preprocessing steps that are defined for
 individual documents to an entire corpus at once:
 
-```julia
-julia> crps = Corpus([StringDocument("Document ..!!"),
-                          StringDocument("Document ..!!")])
-
-julia> prepare!(crps, strip_punctuation)
-
-julia> text(crps[1])
-"Document "
-
-julia> text(crps[2])
-"Document "
+```@repl
+using TextAnalysis
+crps = Corpus([StringDocument("Document ..!!"),
+               StringDocument("Document ..!!")])
+prepare!(crps, strip_punctuation)
+text(crps[1])
+text(crps[2])
 ```
 
 These operations are run on each document in the corpus individually.
@@ -109,7 +65,7 @@ Dict{String,Int64} with 3 entries:
 
 But once this work is done, you can easier address lots of interesting
 questions about a corpus:
-```
+```julia
 julia> lexical_frequency(crps, "Name")
 0.5