Skip to content

🙏 How to Convert a FLUX-Dev Checkpoint to an NF4 Model? #1224

Answered by lllyasviel
sashaok123 asked this question in Q&A
Discussion options

You must be logged in to vote

I will probably give some convert codes later ...

Also people need to notice that GGUF is a pure compression tech, which means it is smaller but also slower because it has extra steps to decompress tensors and computation is still pytorch. (unless someone is crazy enough to port llama.cpp compilers

BNB (NF4) is computational acceleration library to make things faster by replacing pytorch ops with native low-bit cuda kernels, so that computation is faster.

NF4 and Q4_0 should be very similar, with the difference that Q4_0 has smaller chunk size and NF4 has more gaussian-distributed quants. I do not recommend to trust comparisons of one or two images. And, I also want to have smaller chunk …

Replies: 4 comments 11 replies

Comment options

You must be logged in to vote
2 replies
@FLYleoYBQ
Comment options

@yamfun
Comment options

Comment options

You must be logged in to vote
5 replies
@FLYleoYBQ
Comment options

@pflky
Comment options

@JohnnyHoff
Comment options

@pflky
Comment options

@FLYleoYBQ
Comment options

Answer selected by sashaok123
Comment options

You must be logged in to vote
4 replies
@E2GO
Comment options

@Jonseed
Comment options

@Pizzawookiee
Comment options

@al-swaiti
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet