Intro

<aside> ๐Ÿ”ฅ

reasoning์„ ์›ํ•  ๋•Œ๋งŒ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์—ฌ ๋น„์šฉ๊ณผ ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆฌ์ž! (2025๋…„ 6์›” 15์ผ)

</aside>

์ตœ๊ทผ reasoning model์ด ์‚ฌ๋žŒ์˜ ๊นŠ์€ ์ถ”๋ก ์ด ํ•„์š”ํ•œ ์˜์—ญ์—์„œ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ƒ๊ฐํ•˜๋Š” ๊ณผ์ •์ด ์ƒ๋‹นํžˆ ๊ธธ์–ด์ ธ์„œ ์ถ”๋ก ํ•˜๋Š” ๊ณผ์ •์—์„œ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๋“ฑ์— ์‹ฌ๊ฐํ•œ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

์ด ๋…ผ๋ฌธ์—์„œ ์‚ฌ๊ณ (Thinking)์„ ์ƒ๋žตํ•˜๊ณ  ๋ฐ”๋กœ ๋งˆ์ง€๋ง‰ ํ•ด๊ฒฐ์ฑ…์„ ์ง์ ‘ ์ƒ์„ฑํ•˜๋Š” ๋น„์‚ฌ๊ณ (NoThinking)๊ฐ€ ๊ฐ„๋‹จํ•œ task์—์„œ๋Š” ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ ๋ชจ๋“  ์ธก๋ฉด์—์„œ ๋”์šฑ ์ข‹์€ ์„ ํƒ์ด๋ผ๋Š” ๊ฒƒ์„ ๋จผ์ € ์„ค๋ช…ํ•œ๋‹ค. ์ด๊ฒƒ์— ์˜๊ฐ์„ ๋ฐ›์•„์„œ, ์šฐ๋ฆฌ๋Š” ๋ฌธ์ œ์˜ ๋‚œ์ด๋„์— ๊ธฐ๋ฐ˜ํ•ด์„œ ์ ์‘ํ˜•์œผ๋กœ ์ตœ์ ์˜ ์ƒ๊ฐ(thinking) ๋ชจ๋“œ๋ฅผ ์„ ํƒํ•˜๋Š” reasoning ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด AdaptThink๋ผ๊ณ  ํ•˜๋Š” ์ƒˆ๋กœ์šด RL ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค.

ํŠนํžˆ, AdaptThink์€ ๋‘๊ฐœ์˜ ํ•ต์‹ฌ ์ปดํฌ๋„ŒํŠธ๋ฅผ ๊ฐ€์ง€๋Š”๋ฐ,

(1) ์ „์ฒด ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์ด NoThinking์„ ์„ ํƒํ•˜๋„๋ก ๋„์™€์ฃผ๋Š” constrained optimization ๋ชฉ์ ํ•จ์ˆ˜

(2) on-policy training ๋™์•ˆ Thinking(์‚ฌ๊ณ ) ์ƒ˜ํ”Œ๊ณผ NoThinking(๋น„์‚ฌ๊ณ ) ์ƒ˜ํ”Œ ๊ฐ„์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ์ค‘์š”๋„ ์ƒ˜ํ”Œ๋ง ์ „๋žต

์ด๋กœ์จ **์ฝœ๋“œ ์Šคํƒ€ํŠธ(cold start)**๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ , ํ›ˆ๋ จ ๊ณผ์ • ์ „์ฒด์—์„œ ๋‘ ์‚ฌ๊ณ  ๋ชจ๋“œ๋ฅผ ๋ชจ๋‘ ํƒ์ƒ‰(explore)ํ•˜๊ณ  ํ™œ์šฉ(exploit)ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด AdaptThink๋Š” ์ถ”๋ก (inference) ๋น„์šฉ์„ ํ˜„์ €ํžˆ ์ค„์ด๋ฉด์„œ ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ํŠนํžˆ ์„ธ ๊ฐ€์ง€ ์ˆ˜ํ•™ ๋ฐ์ดํ„ฐ์…‹์—์„œ DeepSeekโ€‘R1โ€‘DistillQwenโ€‘1.5B์˜ ํ‰๊ท  ์‘๋‹ต ๊ธธ์ด๋ฅผ 53% ๋‹จ์ถ•ํ•˜๊ณ , ์ •ํ™•๋„๋ฅผ 2.4% ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ์‚ฌ๊ณ  ๋ชจ๋“œ(adaptive thinking-mode) ์„ ํƒ์„ ์ตœ์ ํ™”ํ•˜์—ฌ ์ถ”๋ก  ํ’ˆ์งˆ๊ณผ ํšจ์œจ์„ฑ์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๋ฐ ํฐ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Body

Conclusion


โฌ…๏ธย ์ด์ „ ํŽ˜์ด์ง€

โžก๏ธย ๋‹ค์Œ ํŽ˜์ด์ง€

Reference

https://arxiv.org/abs/2505.13417

https://github.com/THU-KEG/AdaptThink

<aside>

Topics

AdaptThink: Reasoning Models Can Learn When to Think

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

SCREENCODER: ADVANCING VISUAL-TO-CODE GENERATION FOR FRONT-END AUTOMATION VIA MODULAR MULTIMODAL AGENTS

</aside>