Unexpected behaviour of heteronyms handling in FastPitch Riva model #35

sch0ngut · 2023-11-30T11:36:38Z

I have trained a FastPitch model using a custom G2P dictionary but I notice some differences in the transcriptions between the same model in NeMo and in Riva. As mentioned in issue #34 I had to manually uppercase all entries of the G2P dictionary for it to be picked up properly in the deployed Riva model. But I still see that some words get transcribed in one of the two models but not in the other one. I'm wondering if this is due to the heteronyms file facing the same issue as the G2P dictionary regarding uppercasing its entries. While for the dictionary I can simply pass an updated dictionary via --phone_dictionary_file during riva-build I can't find such an option for the heteronyms file. Therefore I don't see any workaround to achieve the same behaviour across NeMo and Riva. In my case setting --preprocessor.g2p_ignore_ambiguous=True avoided at least substantially wrong transcriptions but also throws away anything that's been learned regarding heteronyms disambiguation and is therefore no real solution.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behaviour of heteronyms handling in FastPitch Riva model #35

Unexpected behaviour of heteronyms handling in FastPitch Riva model #35

sch0ngut commented Nov 30, 2023 •

edited

Loading

Unexpected behaviour of heteronyms handling in FastPitch Riva model #35

Unexpected behaviour of heteronyms handling in FastPitch Riva model #35

Comments

sch0ngut commented Nov 30, 2023 • edited Loading

sch0ngut commented Nov 30, 2023 •

edited

Loading