Speed test
Quant Scheme |
Observer |
QuantizationModifier |
GPTQModifier |
fp8_dynamic_per_token |
MinMax |
0.753–0.754 |
|
|
MSE |
0.759-0.760 |
|
fp8_static_per_tensor |
MinMax |
0.757–0.758 |
|
|
MSE |
0.767-0.770 |
|
int8_w8a8_dynamic_per_token |
MinMax |
0.760–0.761 |
0.769–0.771 |
|
MSE |
0.770–0.772 |
0.767-0.767 |
w4a16_actorder_group |
MinMax |
|
0.726-0.726 |
|
MSE |
|
0.712-0.712 |
w4a16_actorder_weights |
MinMax |
|
0.721-0.722 |
|
MSE |
|
0.717-0.720 |
w4a16_grouped_quant |
MinMax |
0.666–0.671 |
0.717-718 |
|
MSE |
0.657–0.659 |
0.723-0.724 |
AWQ results
MinMax:
Task |
Version |
Filter |
n-shot |
Metric |
Value |
wikitext |
2 |
none |
5 |
bits_per_byte |
0.6291 |
|
|
|
5 |
byte_perplexity |
1.5466 |
|
|
|
5 |
word_perplexity |
10.2949 |
MSE:
Task |
Version |
Filter |
n-shot |
Metric |
Value |
wikitext |
2 |
none |
5 |
bits_per_byte |
0.6323 |
|
|
none |
5 |
byte_perplexity |
1.5500 |
|
|
none |
5 |
word_perplexity |
10.4192 |
MSE Observer(0.2 max shrink)
Quant Scheme |
Observer |
QuantizationModifier |
GPTQModifier |
fp8_dynamic_per_token |
MinMax |
0.753–0.754 |
|
|
MSE |
0.759-0.760 |
|
fp8_static_per_tensor |
MinMax |
0.757–0.758 |
|
|
MSE |
0.770-0.770 |
|
int8_w8a8_dynamic_per_token |
MinMax |
0.760–0.761 |
0.769–0.771 |
|
MSE |
0.764-0.767 |
|
vl_fp8_dynamic_per_token |
MSE |
|
0.833 |
vl_w4a16_actorder_weight |
MSE |
|
0.867 |
w4a16_actorder_group |
MinMax |
|
0.726-0.726 |
|
MSE |
|
0.731-0.731 |
w4a16_actorder_weights |
MinMax |
|
0.721-0.722 |
|
MSE |
|
0.724-0.726 |
w4a16_grouped_quant |
MinMax |
0.666–0.671 |
0.717-718 |
|
MSE |
|
0.726-0.727 |
Time Sheets
meta-llama/Meta-Llama-3-8B-Instruct
MinMax:
Step |
Time (seconds) |
_load_model_and_processor |
5.772182941436768 |
_calibrate |
251.95170068740845 |
_run_oneshot |
252.93479776382446 |
_save_compressed_model |
41.454792976379395 |
_handle_recipe |
0.002226591110229492 |
_run_lm_eval |
1196.4064140319824 |
MSE: