Accuracy test | Notion

Quant Scheme	Observer	QuantizationModifier	GPTQModifier
fp8_dynamic_per_token	MinMax	0.753–0.754
	MSE	0.759-0.760
fp8_static_per_tensor	MinMax	0.757–0.758
	MSE	0.767-0.770
int8_w8a8_dynamic_per_token	MinMax	0.760–0.761	0.769–0.771
	MSE	0.770–0.772	0.767-0.767
w4a16_actorder_group	MinMax		0.726-0.726
	MSE		0.712-0.712
w4a16_actorder_weights	MinMax		0.721-0.722
	MSE		0.717-0.720
w4a16_grouped_quant	MinMax	0.666–0.671	0.717-718
	MSE	0.657–0.659	0.723-0.724

AWQ results

MinMax:

Task	Version	Filter	n-shot	Metric	Value
wikitext	2	none	5	bits_per_byte	0.6291
			5	byte_perplexity	1.5466
			5	word_perplexity	10.2949

MSE:

Task	Version	Filter	n-shot	Metric	Value
wikitext	2	none	5	bits_per_byte	0.6323
		none	5	byte_perplexity	1.5500
		none	5	word_perplexity	10.4192

MSE Observer(0.2 max shrink)

Quant Scheme	Observer	QuantizationModifier	GPTQModifier
fp8_dynamic_per_token	MinMax	0.753–0.754
	MSE	0.759-0.760
fp8_static_per_tensor	MinMax	0.757–0.758
	MSE	0.770-0.770
int8_w8a8_dynamic_per_token	MinMax	0.760–0.761	0.769–0.771
	MSE	0.764-0.767
vl_fp8_dynamic_per_token	MSE		0.833
vl_w4a16_actorder_weight	MSE		0.867
w4a16_actorder_group	MinMax		0.726-0.726
	MSE		0.731-0.731
w4a16_actorder_weights	MinMax		0.721-0.722
	MSE		0.724-0.726
w4a16_grouped_quant	MinMax	0.666–0.671	0.717-718
	MSE		0.726-0.727

Time Sheets

meta-llama/Meta-Llama-3-8B-Instruct

MinMax:

Step	Time (seconds)
_load_model_and_processor	5.772182941436768
_calibrate	251.95170068740845
_run_oneshot	252.93479776382446
_save_compressed_model	41.454792976379395
_handle_recipe	0.002226591110229492
_run_lm_eval	1196.4064140319824

MSE: