Skip to content

Latest commit

 

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Evaluation Data for NeMo Text Processing

This dataset is licensed under Creative Commons Attribution 4.0 International License. It can be among others used to evaluate the context-aware hybrid text normalization under nemo_text_processing/hybrid.

It contains 3 datasets:

  • EngConf.txt - manually created datasets focusing on ambiguous semiotic tokens where normalization dependends on the context.
  • GoogleTN.json - derived from Google Text Normalization test data.
  • LibriTTS.json - derived from LibriTTS where normalized text is different from written.

Find more information here.