Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SpkAtt-2023 @ 19be811		SpkAtt-2023 @ 19be811
fine-tuning		fine-tuning
transformed_datasets		transformed_datasets
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
lmsys.json		lmsys.json
requirements.txt		requirements.txt
spkatt-2023.ipynb		spkatt-2023.ipynb

Repository files navigation

GermEval 2023

In this repository, we will shortly share the code of our (Team CPAa) participation in Task 1 (Subtask 1 + 2) of the GermEval 2023 Shared Task.

Setup

install pytorch from here: https://pytorch.org/get-started/locally/

install remaining requirements with: pip install -U -r requirements.txt

Fine-tuning

Prepare Llama 2 models in HF (Huggingface) format (either from Huggingface or from https://github.com/facebookresearch/llama converted with https://github.com/facebookresearch/llama-recipes/#model-conversion-to-hugging-face)

Prepare data with parse_data_alpaca_format.ipynb

set path to data and path to Llama 2 model in fine-tuning scripts in folder fine-tuning/scripts/

set CUDA_VISIBLE_DEVICES if you want to limit the used GPUs

set per_device_train_batch_size and gradient_accumulation_steps so that per_device_train_batch_size * gradient_accumulation_steps is a multiple of 16 and the model fits on your GPU

set max_steps to control the length of training (save_steps determines when checkpoints are created)

If you want to use the scripts with you own data, you should check the parameters source_max_len and target_max_len. The data parsing script contains code to determine the maximum length of the source and target sequences in your data. Adapt the values used in the fine-tuning scripts accordingly.

run fine-tuning:

of 7b cues model: bash fine-tuning/scripts/finetune_spkatt_7b_cues.sh
of 70b cues model: bash fine-tuning/scripts/finetune_spkatt_70b_cues.sh
of 7b roles model: bash fine-tuning/scripts/finetune_spkatt_7b_roles.sh
of 70b roles model: bash fine-tuning/scripts/finetune_spkatt_70b_roles.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GermEval 2023

Setup

Fine-tuning

About

Releases

Packages

Languages

License

EricThorGit/semaSpeakerAttribution

Folders and files

Latest commit

History

Repository files navigation

GermEval 2023

Setup

Fine-tuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages