eric-mitchell / direct-preference-optimization Public

Notifications You must be signed in to change notification settings
Fork 166
Star 2.1k

Code
Issues 38
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: eric-mitchell/direct-preference-optimization

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

38 Open 43 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’

#89 opened Sep 23, 2024 by Alan-D-Chen

GPT4 prompt when evaluating DPO

#88 opened Sep 5, 2024 by kygguo

How to gurantee the output.logits.shape[:-1] == labels.shape

#87 opened Aug 13, 2024 by foreverhell

How are evals done on trained models?

#83 opened May 22, 2024 by lesnikow

where is config document of ipo?

#81 opened May 7, 2024 by 3244we

Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks

#80 opened May 4, 2024 by Jayant1234

Weird logits and model starts degeneration while training DPO

#77 opened Apr 9, 2024 by DungNasSa10

Was it your intention to recreate wandb tables in iterator?

#76 opened Apr 4, 2024 by huskydoge

Can DPO work on BERT-style Model?

#75 opened Mar 24, 2024 by Leo-T-Zang

The number of training steps in the SHP dataset

#73 opened Mar 16, 2024 by bonin147

Computing faster lopgs

#72 opened Mar 9, 2024 by alexvishnevskiy

Implementation for Plackett-Luce rank model

#71 opened Mar 4, 2024 by rohan598

What's the reference policy of Preferred-FT in Figure 2?

#70 opened Mar 4, 2024 by zetian1025

My Code to Reproduce IMDB

#69 opened Feb 26, 2024 by QiyaoWei

Why does SFT sum the cross-entropy loss within each sequence?

#68 opened Feb 17, 2024 by YJWon99

Using cross entropy loss to calculate DPO?

#67 opened Feb 14, 2024 by zachares

Unable to Run SFT

#66 opened Feb 13, 2024 by Rui-Yuan91

Question bout IPO loss vs DPO loss

#64 opened Jan 30, 2024 by MoonBlvd

Reproducing Win Rate inference for TL;DR

#62 opened Jan 9, 2024 by jdchang1

DPO did not achieve the expected experimental effect

#56 opened Dec 7, 2023 by Vance0124

How to re-implement the result of IMDB sentiment generation.

#54 opened Nov 14, 2023 by junkangwu

Llama-2-13b-chat Valid reward accuracy remains ~50%

#53 opened Nov 6, 2023 by nxphi47

Qwen model issues & embedding and loss has nan

#52 opened Nov 3, 2023 by lylcst

error when following the readme to train sft on multiple cards using FSDPTrainer

#51 opened Nov 3, 2023 by NekoMimiUnagi

Question about average_log_prob

#48 opened Oct 24, 2023 by LSX-Sneakerprogrammer

Previous 1 2 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-08-30.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly