Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour of heteronyms handling in FastPitch Riva model #35

Open
sch0ngut opened this issue Nov 30, 2023 · 0 comments
Open

Comments

@sch0ngut
Copy link

sch0ngut commented Nov 30, 2023

I have trained a FastPitch model using a custom G2P dictionary but I notice some differences in the transcriptions between the same model in NeMo and in Riva. As mentioned in issue #34 I had to manually uppercase all entries of the G2P dictionary for it to be picked up properly in the deployed Riva model. But I still see that some words get transcribed in one of the two models but not in the other one. I'm wondering if this is due to the heteronyms file facing the same issue as the G2P dictionary regarding uppercasing its entries. While for the dictionary I can simply pass an updated dictionary via --phone_dictionary_file during riva-build I can't find such an option for the heteronyms file. Therefore I don't see any workaround to achieve the same behaviour across NeMo and Riva. In my case setting --preprocessor.g2p_ignore_ambiguous=True avoided at least substantially wrong transcriptions but also throws away anything that's been learned regarding heteronyms disambiguation and is therefore no real solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant