Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow passing model and tokenizer to ArgillaTrainer directly #3751

Merged
merged 10 commits into from
Sep 15, 2023

Conversation

tomaarsen
Copy link
Contributor

@tomaarsen tomaarsen commented Sep 12, 2023

Hello!

Description

Closes #3631.

This is important to give users freedom to very specifically set up their tokenizer. This is required e.g. for SFT with TRL.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

Updated the relevant tests (TRL, Transformers) to also train with the passed model & tokenizer.

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

TODO:

  • CHANGELOG
  • Documentation
  • Double-check docstrings

  • Tom Aarsen

@tomaarsen tomaarsen marked this pull request as ready for review September 12, 2023 12:29
@tomaarsen
Copy link
Contributor Author

tomaarsen commented Sep 12, 2023

The test failures seem unrelated, all ConnectionTimeout and TransportError(503, ''). @gabrielmbmb Did we find a solution for this yet?

@davidberenstein1957
Copy link
Member

@tomaarsen, we did not but Gabri just mentioned to rerun it.

Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good! some tiny remarks.

tomaarsen and others added 3 commits September 13, 2023 13:22
Co-authored-by: David Berenstein <[email protected]>
Also removed tokenizer from setfit (where it didn't do anything) and
updated some docstrings
@codecov
Copy link

codecov bot commented Sep 14, 2023

@github-actions
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3751-ki24f765kq-no.a.run.app

@tomaarsen tomaarsen merged commit 3aac61f into develop Sep 15, 2023
21 of 22 checks passed
@tomaarsen tomaarsen deleted the feat/trainer_model_tokenizer branch September 15, 2023 08:21
tomaarsen added a commit that referenced this pull request Oct 11, 2023
Hello!

# Argilla Community Growers

Ever since #3751, `model` can also be an already initialized model. This
edge case was being missed before. This should help with the test
failures on #3911.

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)
- [ ] Documentation update

**How Has This Been Tested**

`pytest
.\tests\integration\client\feedback\training\test_trainer.py::test_argilla_trainer_text_classification_with_model_tokenizer`

**Checklist**

- [ ] I added relevant documentation
- [ ] follows the style guidelines of this project
- [x] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---

- Tom Aarsen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] ArgillaTrainer - allow passing initialized model & tokenizer
2 participants