Intro

<aside> ๐Ÿ”ฅ

AI Agent์˜ ๋‚ด๋ถ€์ ์ธ ์ž‘๋™ ๋ฐฉ์‹์ธ ์ถ”๋ก ํ•˜๊ณ  ๊ณ„ํšํ•˜๋Š” ๋Šฅ๋ ฅ์„ ์•Œ์•„๋ด…๋‹ˆ๋‹ค.

</aside>

Agent๋Š” **๋‚ด๋ฉด์˜ ๋Œ€ํ™”(internal dialogue)**๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜๊ณ , ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ ์‰ฝ๊ฒŒ ์„ธ๋ถ€์ ์ธ ๋‹จ๊ณ„๋กœ ์ชผ๊ฐœ๋ฉฐ, ๊ทธ ์ดํ›„์— ์–ด๋–ค ํ–‰๋™(Action)์„ ์ทจํ•  ์ง€ ์˜์‚ฌ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์ด ํ–‰๋™(Action)ํ•˜๊ธฐ ์ „์— โ€œstep by stepโ€ํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฒ•์ธ Re-Act(Chain of Thought) ์ ‘๊ทผ ๋ฐฉ์‹๋„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์˜ํ•  ์ ์€ ํ—ˆ๊น…ํŽ˜์ด์Šค์—์„œ๋Š” ReAct๋ผ๋Š” ํ‘œํ˜„์„ ์‚ฌ์šฉํ•˜๊ธด ํ•˜์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” **CoT(Chain of Thought)**๋ผ๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์›ํ™œํ•œ ์†Œํ†ต์„ ์œ„ํ•ด Re-Act๋ผ๋Š” ํ‘œํ˜„ ๋Œ€์‹  CoT๋ฅผ ์“ฐ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค! (ํ”„๋ก ํŠธ์—”๋“œ react ํ”„๋ ˆ์ž„์›Œํฌ๋ž‘ ๊ฒน์ณ์„œ ๊ฒ€์ƒ‰๋„ ์–ด๋ ต์Šต๋‹ˆ๋‹ค..)

Body

1. Thought

์ƒ๊ฐ(Thoughts)๋Š” task(๋ฌธ์ œ)๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  1. Reasoning : Agent์˜ ๋‚ด๋ถ€์ ์ธ ์ถ”๋ก  ๊ณผ์ •
  2. Planning : ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ• ์ง€ ๊ณ„ํšํ•˜๋Š” ๊ฒƒ

์œ„์ฒ˜๋Ÿผ ์‚ฌ๊ณ ํ•˜๋Š” ๊ณผ์ •์€ Prompt์— ์ œ๊ณต๋œ ์ •๋ณด๋ฅผ ๋ถ„์„ํ• ๋•Œ ์‚ฌ์šฉํ•˜๋Š” Agent์˜ LLM ๋Šฅ๋ ฅ์„ ํ™œ์šฉํ•˜์—ฌ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ํ‘œํ˜„ํ•ด๋ณด๋ฉด Agent์˜ **๋‚ด๋ฉด์˜ ๋Œ€ํ™”(internal dialogue)**๋กœ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๋ฉด์˜ ๋Œ€ํ™”๋Š” Agent๊ฐ€ ํ˜„์žฌ ๊ณผ์ œ๋ฅผ ๊ณ ๋ คํ•˜๊ณ  ๊ทธ์— ๋Œ€ํ•œ ์ „๋žต์„ ์„ธ์šฐ๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

Agent์˜ ์ƒ๊ฐ(thought)์€ ํ˜„์žฌ ํ™•์ธํ•œ ์ •๋ณด(Observation)๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ์— ์ทจํ•ด์•ผ ํ•  ํ–‰๋™(Action)์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ํ†ตํ•ด Agent๋Š” ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ๋” ์ž‘๊ณ  ํ’€๊ธฐ ์‰ฌ์šด ์ž‘์€ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ๋ฉฐ, ๊ณผ๊ฑฐ ๊ฒฝํ—˜์„ ๋ฐ˜์˜(reflect)ํ•˜๋ฉฐ, ์ƒˆ๋กœ์šด ์ •๋ณด์— ๋”ฐ๋ผ ์ง€์†์ ์œผ๋กœ ๊ณ„ํš์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ์ผ๋ฐ˜์ ์ธ ์‚ฌ๊ณ  ์œ ํ˜•(Thought type)๊ณผ ๊ทธ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

์‚ฌ๊ณ  ์œ ํ˜• ์˜ˆ์‹œ
๊ณ„ํš(Planning) ์ด ์ž‘์—…์„ ์„ธ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ ์•ผ ํ•ด! 1. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ 2. ํŠธ๋ Œ๋“œ ๋ถ„์„ 3. ๋ณด๊ณ ์„œ ์ƒ์„ฑ
๋ถ„์„(Analysis) ์˜ค๋ฅ˜ ๋ฉ”์„ธ์ง€๋ฅผ ๋ณด๋‹ˆ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ฐ๊ฒฐ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”๊ฑฐ ๊ฐ™์•„!
์˜์‚ฌ๊ฒฐ์ • (Decision Making) ์‚ฌ์šฉ์ž์˜ ์˜ˆ์‚ฐ ์ œ์•ฝ์„ ๊ณ ๋ คํ•  ๋•Œ, ์ค‘๊ฐ„ ๊ฐ€๊ฒฉ๋Œ€ ์˜ต์…˜์„ ์ถ”์ฒœํ•ด์•ผ๊ฒ ์–ด!
๋ฌธ์ œ ํ•ด๊ฒฐ(Problem Solving) ์ด ์ฝ”๋“œ๋ฅผ ์ตœ์ ํ™”ํ•˜๋ ค๋ฉด ๋จผ์ € ๋ณ‘๋ชฉ ์ง€์ ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กœํŒŒ์ผ๋ง์„ ํ•ด์•ผ ํ•ด.
๊ธฐ์–ต ์ข…ํ•ฉ(Memory Integration) ์‚ฌ์šฉ์ž๊ฐ€ ์ „์— Python์„ ์„ ํ˜ธํ•œ๋‹ค๊ณ  ํ–ˆ์œผ๋‹ˆ, Python ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ด์•ผ๊ฒ ๋‹ค.
์ž๊ธฐ ์„ฑ์ฐฐ(Self-Reflection) ์ง€๋‚œ๋ฒˆ ์ ‘๊ทผ ๋ฐฉ์‹์€ ์ž˜ ๋˜์ง€ ์•Š์•˜์œผ๋‹ˆ, ๋‹ค๋ฅธ ์ „๋žต์„ ์‹œ๋„ํ•ด์•ผ๊ฒ ์–ด.
๋ชฉํ‘œ ์„ค์ • (Goal Setting) ์ด ์ž‘์—…์„ ๋๋‚ด๊ธฐ ์œ„ํ•ด ๋จผ์ € ์ˆ˜์šฉ ๊ธฐ์ค€์„ ์ •ํ•ด์•ผ ํ•ด.
์šฐ์„ ์ˆœ์œ„ ๊ฒฐ์ • (Prioritization) ๋ณด์•ˆ ์ทจ์•ฝ์  ๋ฌธ์ œ๋ฅผ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ ์ถ”๊ฐ€๋ณด๋‹ค ๋จผ์ € ํ•ด๊ฒฐํ•ด์•ผ ํ•ด.

๋งŒ์•ฝ, function calling(ํ•จ์ˆ˜ ํ˜ธ์ถœ)์— ํŠนํ™”๋˜์–ด fine-tuning๋œ LLM์˜ ๊ฒฝ์šฐ ์ด๋Ÿฌํ•œ ์ƒ๊ฐ ๊ณผ์ •์€ ์„ ํƒ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.

2. Chain of Thought(Re-Act)

CoT(Chain of Thought) ๋ฐฉ์‹์€ ์ƒ๊ฐ(Reasoning)๊ณผ ํ–‰๋™(Acting)์„ ๊ฒฐํ•ฉ(concatenation)ํ•œ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

CoT๋Š” LLM์ด Next Token Prediction์„ ํ•˜๊ธฐ ์ง์ „์— โ€œLetโ€™s think step by stepโ€์ด๋ผ๊ณ  ํ•˜๋Š” ํ…์ŠคํŠธ๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์ถ”๊ฐ€ํ•˜๋Š” ์•„์ฃผ ๊ฐ„๋‹จํ•œ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ, ์‹ค์ œ๋กœ LLM ๋ชจ๋ธ์—๊ฒŒ โ€œstep by stepโ€๋ผ๊ณ  ์ง€์‹œ(prompting)ํ•˜๋ฉด, ์ตœ์ข… ๋‹ต๋ณ€์„ ๋ฐ”๋กœ ์ƒ์„ฑํ•˜๋Š” ๋Œ€์‹  **๊ณ„ํš์„ ์„ธ์šฐ๋Š” ๋ฐฉํ–ฅ(generate a plan)**์œผ๋กœ Next Token Prediction์„ ์œ ๋„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฌธ์ œ๋ฅผ ์—ฌ๋Ÿฌ ํ•˜์œ„ task(sub-tasks)๋กœ ๋‚˜๋ˆ„๋„๋ก ์žฅ๋ คํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ CoT ๋ฐฉ์‹์€ ํ•˜์œ„ task๋“ค์„ ๋ณด๋‹ค ์ž์„ธํžˆ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋ฉฐ, ์ตœ์ข… ๋‹ต๋ณ€์„ ๊ณง๋ฐ”๋กœ ์ƒ์„ฑํ•˜๋ ค๋Š” ๋ฐฉ์‹๋ณด๋‹ค ์ผ๋ฐ˜์ ์œผ๋กœ ์˜ค๋ฅ˜๊ฐ€ ์ ๊ฒŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, ์ œ ๊ฒฝํ—˜์ƒ ๋ณด๋ฉด task ์ž์ฒด๊ฐ€ Reasoning task์™€ ์—ฐ๊ด€๋œ ๊ฒฝ์šฐ์—๋งŒ ์„ฑ๋Šฅ์ด ๋” ์˜ฌ๋ž์Šต๋‹ˆ๋‹ค.

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2025-03-24 แ„‹แ…ฉแ„Œแ…ฅแ†ซ 9.11.23.png

Conclusion

์ตœ๊ทผ(2025๋…„ 3์›” 24์ผ)์—๋Š” **์ถ”๋ก  ์ „๋žต(Reasoning strategy)**์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” Deepseek R1์ด๋‚˜ OpenAI์˜ o1 ๊ฐ™์€ ๋ชจ๋ธ๋“ค์ด ๋“ฑ์žฅํ•œ ๋ฐฐ๊ฒฝ์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ๋“ค์€ โ€œthink before answeringโ€ ๋ฐฉ์‹์œผ๋กœ fine-tuning ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์€ ํ•ญ์ƒ ํŠน์ •ํ•œ thinking ์„น์…˜์„ ํฌํ•จํ•˜๋„๋ก ํ›ˆ๋ จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค(<think>์™€ </think> special token์œผ๋กœ ๊ฐ์‹ธ์ ธ ์žˆ์Šต๋‹ˆ๋‹ค). ์ด๋Š” ๋‹จ์ˆœํ•œ CoT์™€ ๊ฐ™์€ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฒ•์ด ์•„๋‹ˆ๋ผ, ๋ชจ๋ธ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฐฉ์‹๋Œ€๋กœ ์‚ฌ๊ณ  ๊ณผ์ •์„ ์ƒ์„ฑํ•˜๋„๋ก ์ˆ˜์ฒœ ๊ฐœ์˜ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•œ ์ผ์ข…์˜ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.


โฌ…๏ธย ์ด์ „ ํŽ˜์ด์ง€

1.6. Understanding AI Agents through the Thought-Action-Observation Cycle

โžก๏ธย ๋‹ค์Œ ํŽ˜์ด์ง€

1.8. Actions: Enabling the Agent to Engage with Its Environment

Reference

https://huggingface.co/learn/agents-course/en/unit1/thoughts

<aside>

Topics

1. Introduction to Agents

1.1. Introduction

1.2. What is Agent?

1.3. What are LLMs?

1.4. Messages and Special Tokens

1.5. What are Tools?

1.6. Understanding AI Agents through the Thought-Action-Observation Cycle

1.7. Thought, Internal Reasoning and the Re-Act Approach

1.8. Actions: Enabling the Agent to Engage with Its Environment

1.9. Observe: Integrating Feedback to Reflect and Adapt

</aside>