LocalLLaMA

2220 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

[email protected]

What is better: higher quantiation or higher parameter count? (yiffit.net)

submitted 1 year ago by [email protected] to c/[email protected]

17 comments fedilink hide all child comments

For example, does a 13B parameter model at 2_K quantiation perform worse than a 7B parameter model at 8bit or 16bit?

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 9 points 1 year ago* (last edited 1 year ago) (2 children)

https://github.com/ggerganov/llama.cpp#quantization

https://github.com/ggerganov/llama.cpp/pull/1684

Regarding your question: 13B 2_K seems to be on par with 7B 16bit and 8bit. Not much of a difference between all those. (Look at the perplexity values. Lower is better.) The second link has a nice graph.

Most people don't go as low as 2bit though. Look at the graph, below 4bit things start to deteriorate.

[–] [email protected] 5 points 1 year ago

That graph is great. Very easy to understand. Thank you!

[–] [email protected] 2 points 1 year ago

These are good sources, to add one more, the GPTQ paper talks a lot about perplexity at several quantization and model sizes:

https://arxiv.org/abs/2210.17323