Skip to content

Commit

Permalink
fixed word level extract features for roberta-xlmr
Browse files Browse the repository at this point in the history
Summary: Pull Request resolved: fairinternal/fairseq-py#933

Differential Revision: D18783780

fbshipit-source-id: fa0a27fab886a5fa5be8d5f49151d1d9dd9775f1
  • Loading branch information
Naman Goyal authored and facebook-github-bot committed Dec 3, 2019
1 parent 1c56594 commit d48895b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion fairseq/models/roberta/alignment_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ def align_bpe_to_words(roberta, bpe_tokens: torch.LongTensor, other_tokens: List
List[str]: mapping from *other_tokens* to corresponding *bpe_tokens*.
"""
assert bpe_tokens.dim() == 1
assert bpe_tokens[0] == 0

def clean(text):
return text.strip()
Expand All @@ -32,7 +33,6 @@ def clean(text):
other_tokens = [clean(str(o)) for o in other_tokens]

# strip leading <s>
assert bpe_tokens[0] == '<s>'
bpe_tokens = bpe_tokens[1:]
assert ''.join(bpe_tokens) == ''.join(other_tokens)

Expand Down

0 comments on commit d48895b

Please sign in to comment.