-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rehearsal.py is using ontoNotes raw format not bilou format #17
Comments
It is converting it to Prodigy format before putting it into the DB. See here. |
@ahalterman yeah I got this, but the problem is it is looking at the annotations format that directly from ontoNotes, but not the bilou format, when I passed in the BILOU format data it returned 0 records being transferred, but if ontoNOtes format everything got transferred. |
I was confused: the current rehearsal.py uses CoNLL format, not BILOU. Change rehearsal.py to handle BILOU formats, too. |
@ahalterman do you get it now Andy? we need rehearsal to mixed in Bilou with Prodigy not Cornll with Prodigy. |
I just added some code to do this, along with the code needed to use Arabic. (It was giving me some major git errors when I tried to put this in master). I realized I'm still confused, through: Prodigy doesn't handle BILOU, only spans. So are you training with spaCy or Prodigy for this step? |
@ahalterman |
🤦♂️ So we need it to go from BILOU to Prodigy format...got it. Sorry about my confusion! |
@ahalterman no problem, :) |
better to make it look at bilou format and change to prodigy format since if in OntoNotes format it does not take advantage of the ner tag merged and anercorp data merged that we already worked on.
The text was updated successfully, but these errors were encountered: