Dataset: MATH500 (the same as Let’s Verify Step by Step)

https://embed.notionlytics.com/wt/ZXlKM2IzSnJjM0JoWTJWVWNtRmphMlZ5U1dRaU9pSmxOV0p2TldRd1YyWnNTSGhVUW1kelUyUkVPQ0lzSW5CaFoyVkpaQ0k2SWpWak5EUTBNelpoTW1Oa05qUXpZak00TVdVM05EUXlOMlUzWmpkaU1UUm1JbjA9

Pass@k

Untitled

Models:

model_seqs = [
    [
        "deepseek-ai/deepseek-math-7b-base",
        "deepseek-ai/deepseek-math-7b-instruct",
        "deepseek-ai/deepseek-math-7b-rl",
    ],
    [
        "mistralai/Mistral-7B-v0.1",
        "peiyi9979/mistral-7b-sft",
        "peiyi9979/math-shepherd-mistral-7b-rl",
    ],
]

ensemble_type meaning:

Cost of large-scale sampling

<aside> 🔑 Sampling 64 samples per prompt in MATH500 with DeepSeekMath-7B-RL using vLLM on a A800-PCIe(80GB) takes ~1.5hr (170 ms/sample)

</aside>

Dataset: MATH500

Framework: vLLM

Device: A800-PCIe(80GB) * 1

#Shots: 1 for base models, 0 for instruction-tuned models

Parameters:

max_new_toks: 2048
gpu_mem_util: 0.85
temperature: 0.7

Cost of large-scale sampling