Skip to content

Commit

Permalink
Update normalization.md
Browse files Browse the repository at this point in the history
  • Loading branch information
taku910 authored Jun 30, 2018
1 parent 6c86c2f commit ee4ca7f
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions doc/normalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ SentencePiece provides the following pre-defined normalization rule. It is recom

* **nmt_nfkc**: [NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence) normalization with some additional normalization around spaces. (default)
* **nfkc**: original NFKC normalization.
* **nmt_nfkc_cf**: nmt_nfkc + [Unicode case folding](https://www.w3.org/International/wiki/Case_folding) (mostly lower casing)
* **nfkc_cf**: nfkc + [Unicode case folding](https://www.w3.org/International/wiki/Case_folding).
* **identity**: no normalization

You can choose the normalization rule with `--normalization_rule_name` flag.
Expand Down

0 comments on commit ee4ca7f

Please sign in to comment.