Using module_to_save to save parameters inited by nn.parameters dose't work! #2099

minmie · 2024-09-26T02:48:38Z

System Info

transformers version: 4.44.0
Platform: Windows-10-10.0.19041-SP0
Python version: 3.10.14
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.4
Accelerate version: 0.33.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:

Who can help?

@BenjaminBossan @sayakpaul

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

import evaluate
import numpy as np
from transformers import TrainingArguments, Trainer, ViTForImageClassification
from transformers import ViTImageProcessor, DefaultDataCollator
from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor
from datasets import load_dataset
from peft import get_peft_model, LoraConfig

food = load_dataset("food101", split="train[:5000]")
labels = food["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

SIZE = 512
checkpoint = "google/vit-base-patch16-224-in21k"
image_processor = ViTImageProcessor.from_pretrained(checkpoint, size=SIZE)
model = ViTForImageClassification.from_pretrained(
    checkpoint,
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id,
    image_size=SIZE
)

lora_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.3,
    bias='none',
    target_modules=['query', 'key', 'value'],
    modules_to_save=[
        'classifier',
        "model.vit.embeddings.position_embeddings"
    ]
)


normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
size = (
    image_processor.size["shortest_edge"]
    if "shortest_edge" in image_processor.size
    else (image_processor.size["height"], image_processor.size["width"])
)
_transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])


accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)


data_collator = DefaultDataCollator()

training_args = TrainingArguments(
    output_dir="my_awesome_food_model",
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=food["train"],
    eval_dataset=food["test"],
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
)

trainer.train()

Expected behavior

I have increased the resolution of the image to 512, and I hope to retrain the position emb initialized by nn.parameters (as I have increased the image resolution) while fine-tuning the model using Lora.However, I found that the final saved Lora model did not include the position emb. How should I solve this problem?

The text was updated successfully, but these errors were encountered:

dengchengxifrank · 2024-09-26T03:28:36Z

Hi @minmie , I also found this problem. If you try to run the code as following :
for name, params in model.named_parameters(): print('name ',name)
you will find that the torch.nn.Parameters module will not appear in name. modules_to_save will Use regular matching to find the
"model.vit.embeddings.position_embeddings" , as for here , obviously , modules_to_save will find nothing. My solution to this is to use nn.Embedding instead.

BenjaminBossan · 2024-09-26T08:40:38Z

I don't understand where the nn.Parameter is registered. Is it implicit because you pass image_size=SIZE?

My solution to this is to use nn.Embedding instead.

This means you found a way to make it work? Great. It would be fantastic if you could share your code here in case other users encounter the same problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using module_to_save to save parameters inited by nn.parameters dose't work! #2099

Using module_to_save to save parameters inited by nn.parameters dose't work! #2099

minmie commented Sep 26, 2024

dengchengxifrank commented Sep 26, 2024

BenjaminBossan commented Sep 26, 2024

Using module_to_save to save parameters inited by nn.parameters dose't work! #2099

Using module_to_save to save parameters inited by nn.parameters dose't work! #2099

Comments

minmie commented Sep 26, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

dengchengxifrank commented Sep 26, 2024

BenjaminBossan commented Sep 26, 2024