This is a repository with code for presentation about Virtual Assistants I gave on PyData Bydgoszcz in November 2023.
Follow these steps to reproduce the Natural Language Understanding (NLU) model training from our recent PyData event session.
-
Environment Setup
- Create a Conda environment:
conda create --name joint_nlu python=3.10
- Activate the environment:
conda activate joint_nlu
- Install dependencies:
Alternatively, this step can be integrated into the environment creation process.
pip install -r requirements.txt
- Create a Conda environment:
-
Grammar Modification
- Navigate to the grammars directory:
cd grammars/
- Modify the
calendar.gram
file as desired.
- Navigate to the grammars directory:
-
Grammar Expansion
- Expand the grammars into corpora:
python generate_patterns.py -f calendar.gram > iva-calendar-0.1.0.tsv
- Fix two common errors:
- Remove double underscores (
"__"
). This can be done manually or usingsed
:sed -i 's/__//g' calendar.gram
- Eliminate spaces surrounding tabs. Replace
(space)(tab)(space)
with(tab)
.
- Remove double underscores (
- Expand the grammars into corpora:
-
Slot Expansion
- Run the slot expansion script:
./expand-slots.sh iva-calendar-0.1.0.tsv iva-corpus-calendar-0.1.0.tsv true true
- Run the slot expansion script:
-
Configuration Adjustments
- Modify the
join_nlu
configuration injoint_nlu/calendar.json
. Experiment with thenum_train_epochs
setting as needed.
- Modify the
-
Corpus File Naming
- Ensure your corpus file name matches the
_URL
incustom.py
.
- Ensure your corpus file name matches the
-
Model Training on Google Colab
- Import
joint_nlu_train_on_colab.ipynb
into Google Colab and follow the instructions therein. - After cloning the
joint_nlu
repository in Colab, upload three files into thejoint_nlu
directory:calendar.json
, the corpus file, andcustom.py
. - Proceed with the training.
- Import
Upon successful completion, assuming "push_to_hub": true
is set in calendar.json
, your model will be uploaded to your Hugging Face Hub. For deployment and usage, refer to the Joint NLU Repository.