Mamba & RecurrentGemma: enable strict signature #31549

gante · 2024-06-22T11:27:46Z

What does this PR do?

Mamba accepts **kwargs, and thus attention_mask can be passed. Many users thus assume it behaves just like other models and can support left-padding.

RecurrentGemma also accept **kwargs, but simply not to crash generate.

This PR enables a strict signature on Mamba and RecurrentGemma.

gante · 2024-06-22T11:30:29Z

src/transformers/models/mamba/modeling_mamba.py

@@ -545,7 +545,6 @@ def forward(
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
-        **kwargs,  # `attention_mask` is passed by the tokenizer and we don't want it


alternatively, we can accept attention_mask and raise an exception when it is not None or not all ones

HuggingFaceDocBuilderDev · 2024-06-22T11:53:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Let's googoogogogogo 🚀

ArthurZucker · 2024-06-27T10:38:20Z

src/transformers/generation/utils.py

+            model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+            model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})


yesssss I think I have a PR open where I dod this! Finally!

amyeroberts · 2024-06-27T10:58:37Z

src/transformers/models/mamba/modeling_mamba.py

@@ -545,7 +545,6 @@ def forward(
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
-        **kwargs,  # `attention_mask` is passed by the tokenizer and we don't want it


Removing this will break FDSP :( See #31161

@amyeroberts I had a look and it should be fine: this PR removes **kwargs from the model class (e.g. MambaModel), while the FSDP PR ensures there are **kwargs in the decoder layers (e.g. FalconDecoderLayer).

We can see on main that the model themselves don't have **kwargs, even after the FSDP fix (e.g. llama) 🤗

enable strict signature

a25e037

gante requested a review from ArthurZucker June 22, 2024 11:27

this should not have been deleted

20b49b5

gante commented Jun 22, 2024

View reviewed changes

gante changed the title ~~Mamba: enable strict signature~~ Mamba & RecurrentGemma: enable strict signature Jun 22, 2024

recurrent_gemma too

1358011

ArthurZucker approved these changes Jun 27, 2024

View reviewed changes

amyeroberts reviewed Jun 27, 2024

View reviewed changes

gante merged commit 594c161 into huggingface:main Jul 8, 2024
23 checks passed

gante deleted the mamba_strict_signature branch July 8, 2024 14:48

ArthurZucker mentioned this pull request Aug 9, 2024

Google RecurrentGemma Models don't work in Transformers 4.43 anymore #32549

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mamba & RecurrentGemma: enable strict signature #31549

Mamba & RecurrentGemma: enable strict signature #31549

gante commented Jun 22, 2024 •

edited

Loading

gante Jun 22, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 22, 2024

ArthurZucker left a comment

ArthurZucker Jun 27, 2024

amyeroberts Jun 27, 2024

gante Jun 27, 2024

amyeroberts Jun 27, 2024

		model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
		model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})

Mamba & RecurrentGemma: enable strict signature #31549

Mamba & RecurrentGemma: enable strict signature #31549

Conversation

gante commented Jun 22, 2024 • edited Loading

What does this PR do?

gante Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 22, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jun 27, 2024

Choose a reason for hiding this comment

amyeroberts Jun 27, 2024

Choose a reason for hiding this comment

gante Jun 27, 2024

Choose a reason for hiding this comment

amyeroberts Jun 27, 2024

Choose a reason for hiding this comment

gante commented Jun 22, 2024 •

edited

Loading

gante Jun 22, 2024 •

edited

Loading