Skip to content

Tags: JuliaText/TextAnalysis.jl

Tags

v0.8.1

Toggle v0.8.1's commit message
## TextAnalysis v0.8.1

[Diff since v0.7.5](v0.7.5...v0.8.1)


**Merged pull requests:**
- allow DocumentMetadata to hold arbirtary data (#158) (@tanmaykm)
- Directional coom (#264) (@atantos)
- Fixed UNICODE processing with the `strip_non_letters` flag in src/preprocessing.jl (#265) (@sigmundv)
- ROUGE: fixed sentences calculation and some minor refactoring (#272) (@rssdev10)
- CI: updated scripts. Minimal Julia is 1.6 now (#275) (@rssdev10)
- Code refactoring (#276) (@rssdev10)
- documentation update (#277) (@rssdev10)
- CompatHelper: add new compat entry for Statistics at version 1, (keep existing compat) (#278) (@github-actions[bot])
- Fix/showprogress (#281) (@rssdev10)
- Fix/style improvement (#282) (@rssdev10)

**Closed issues:**
- error on LDA Julia 0.4 (#37)
- remove_corrupt_utf8() not working (#41)
- remove_corrupt_utf8! giving "no method matching zero" error (#68)
- stemming issue for certain words e.g. providing -> provid (#69)
- rouge_n not defined (#193)
- error strip_spares_terms not defined (#212)
- Eval can be replaced by getfield in tag_scheme! (#242)
- Seems there are some typos in documents (#249)
- StringIndexError when trying to create a StringDocument based on a UTF8 string (#255)
- Converting Corpus to Dataframe not working. (#279)

v0.7.5

Toggle v0.7.5's commit message
## TextAnalysis v0.7.5

[Diff since v0.7.4](v0.7.4...v0.7.5)


**Merged pull requests:**
- CompatHelper: add new compat entry for DelimitedFiles at version 1, (keep existing compat) (#269) (@github-actions[bot])
- Clean README, docs and docstrings (#270) (@pitmonticone)
- Update coom.jl (#271) (@ms10596)
- added BLEU score (#273) (@rssdev10)
- Update README.md (#274) (@ms10596)

**Closed issues:**
- Implementation of cosine similarity? (#215)
- Dependence on BinaryProvider.jl prevents TextAnalysis from working on arm64-apple-darwin natively. (#260)

v0.7.4

Toggle v0.7.4's commit message
## TextAnalysis v0.7.4

[Diff since v0.7.3](v0.7.3...v0.7.4)


**Closed issues:**
- PerceptronTagger is not defined (#262)
- Libstemmer not defined for ARM (M1 Mac) (#263)

**Merged pull requests:**
- Update README.md (#254) (@dunefox)
- Move some docs to TextModels (#256) (@AdarshKumar712)
- fix string indexing in `summary` (#257) (@ericphanson)
- CompatHelper: bump compat for StatsBase to 0.34, (keep existing compat) (#268) (@github-actions[bot])

v0.7.3

Toggle v0.7.3's commit message
## TextAnalysis v0.7.3

[Diff since v0.7.2](v0.7.2...v0.7.3)


**Closed issues:**
- CI is failing on the latest Julia master (#252)

**Merged pull requests:**
- add cosine similarity calculation (#248) (@hhaensel)
- Latent Dirichlet allocation: display a progress bar during Gibbs sampling (#250) (@DilumAluthge)
- remove `write_sub` (#253) (@aviks)

v0.7.2

Toggle v0.7.2's commit message
## TextAnalysis v0.7.2

[Diff since v0.7.1](v0.7.1...v0.7.2)


**Closed issues:**
- Methods to merge two DocumentTermMatrix instances (#243)

**Merged pull requests:**
- CompatHelper: bump compat for "DataFrames" to "0.22" (#239) (@github-actions[bot])
- Use Tables.jl, remove explicit DataFrames dependency (#240) (@aviks)
- methods to help manipulate and update DocumentTermMatrix incrementally (#244) (@tanmaykm)
- optimize document term sparse matrix operations (#245) (@tanmaykm)
- fix Project.toml, add Tables compat entry (#246) (@tanmaykm)

v0.7.1

Toggle v0.7.1's commit message
## TextAnalysis v0.7.1

[Diff since v0.7.0](v0.7.0...v0.7.1)


**Closed issues:**
- Move models to TextModels.jl (#111)
- Tag a new release (#177)
- Provide libstemmer through Yggdrasil (#204)
- Julia TextAnalysis NERTagger (#214)
- Unable to convert corpus to DataFrame (#236)

**Merged pull requests:**
- Fix conversion to DataFrame (#237) (@aviks)
- fix link to the docs in README.md (#238) (@gxyd)

v0.7.0

Toggle v0.7.0's commit message
## TextAnalysis v0.7.0

[Diff since v0.6.0](v0.6.0...v0.7.0)


**Closed issues:**
- Feature Request: Part of speech tagging (#2)
- Implement Named Entity Recognition (NER) (#117)
- Can a new release be tagged? (#139)
- Need API documentation (#146)
- Extend Naive Bayes Classifier to support the various document types (#152)
- Summarize function throws error for docs with less than 5 sentences. (#153)
- UndefVarError when `prepare!` called on Corpus (#171)
- Need to export Flux, Tracker (#178)
- Docs and docstring for Sentiment Analysis model needs fixing (#182)
- NaiveBayesClassifier scope error. (#192)
- APIs to avoid datatype constraint between CorpusLoaders.jl and TextAnalysis.jl (#195)
- Add entry for ULMFiT in docs/make.jl (#196)
- Unexpected behaviour of ngram(sd, 3) (#202)
- "resulting" bug (#205)
- Statistical tokenization algorithms  (#207)
- Trying to use NaiveBayesClassifier results in UndefVarError (#216)

**Merged pull requests:**
- Simple document classifier (AKA spam filter) (#106) (@MikeInnes)
- Average Perceptron POS Tagger (Issue #2) (#131) (@ComputerMaestro)
- Remove HTML style tags in preprocessing (#137) (@phereford)
- PR: To address performance issues with stopword removal (#141) (@asbisen)
- Indentation fix patch (#142) (@Ayushk4)
- Fix deprecated function in extended example (#144) (@ViralBShah)
- Add characters to list of punctuations (#145) (@asbisen)
- Add API documentation (#147) (@aquatiko)
- Update ngramizer.jl (#148) (@djokester)
- Add offline Documentation (Docstrings) to the codebase (#150) (@Ayushk4)
- Documentation for Bayes.jl (#151) (@Ayushk4)
- Update summarizer.jl (#154) (@Ayushk4)
- Fix deprecations in show.jl (#155) (@Ayushk4)
- Added ROUGE Score to TextAnalysis.jl (#156) (@djokester)
- allow multiple ngram complexity in NGramDocument, ngrams and ngrammize (#157) (@tanmaykm)
- Update the documentation reflecting changes in show.jl (#159) (@Ayushk4)
- Add functions for Tagging Schemes and Conversion. (#161) (@Ayushk4)
- Conditional Random Fields (#162) (@Ayushk4)
- BM25, Co-occurrence Matrix, faster ROUGE, Fixing LSA. (#165) (@Ayushk4)
- Use datadeps for AvgPerceptronTagger, add pos tagging over document types (#166) (@Ayushk4)
- Named Entity Recognition (#167) (@Ayushk4)
- Add API for Part of Speech Tagging (#169) (@Ayushk4)
- Add favicon to the docs (#170) (@Ayushk4)
- Fix prepare! on strip_whitespace (#172) (@Ayushk4)
- Readme updated. Docs edited to provide API Reference online. (#173) (@Ayushk4)
- ULMFiT (#179) (@aviks)
- Fix Sequence Labelling Models, fixes #178 (#180) (@Ayushk4)
- Drop support for 0.7 and add support for 1.3 (#181) (@Ayushk4)
- Minor fix of doc and docstring of Sentiment Analysis (#184) (@tejasvaidhyadev)
- Remove duplicate entries in Project.toml, and fix a broken build (#189) (@DilumAluthge)
- Bump version number from "0.6.0" to "0.7.0" (#190) (@DilumAluthge)
- Install TagBot as a GitHub Action (#194) (@JuliaTagBot)
- updated docs/make.jl (#198) (@tejasvaidhyadev)
- make DTM type generic (#199) (@baggepinnen)
- bug fix in get_sentiment function (#206) (@tejasvaidhyadev)
- Language Model Interface (#210) (@tejasvaidhyadev)
- Modify loop in initial assignments of lda to use sparse structure. (#213) (@jmoralez)
- export NaiveBayesClassifier (#217) (@agarie)
- Extend NaiveBayesClassifier to support Documents as input #152 (#219) (@KimBue)
- Minor Fixes (#220) (@tejasvaidhyadev)
- LM doc fix (#233) (@tejasvaidhyadev)
- Split project, separate TextModels (#234) (@aviks)

v0.6.0

Toggle v0.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request #137 from phereford/master

Remove HTML style tags in preprocessing

v0.5.0

Toggle v0.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request #97 from JuliaText/as/towards07

Prepare for 1.0

v0.4.2

Toggle v0.4.2's commit message
bug fixes for DTM constructor and for remove_patterns (#94)

* Fixes case where DocumentTermMatrix(crps, lex) would construct a dtm of wrong dimensions if a term provided in lex does not occur in the crps

* added DocumentTermMatrix constructor that takes a crps and a prespecified terms vector

* fixed remove_patterns to use nextind() to find starting position of next unstripped substring. Closes #92

* removed leftover info() statement in prepare! test