As we know that There is a runtime when Golang running. The runtime perform the scheduled tasks(goroutine) in user space rather than kernel, so it's more lightweight. It do a better tradeoff between system resource usage and performance, especially in IO tasks. In this article, I'll show you Golang scheduler's history, Goroutine scheduler GMP's design pattern, and some cases how does GMP handle.

Single process age

We do not need a scheduler in single process age. all task should be processed serialized. There are obvious shortcomings in this pattern.

  1. There only one process, Computer must process tasks one by one.
  2. CPU time will be wasted when process is blocking.

Multiprocess/Multithread age

To solve the blocking problem, we can make cpu execute other tasks when current process blocking. and also, we create a method, divide cpu time into very small time slice(10ms probably) and run tasks with time limit, to ensure all tasks could be executed. And it seems all tasks running at the same time due to the time slice is very small.

process scheduling

process scheduling

At the same time, CPU has to handle context switch hold by processes, to create,switch,destroy a process will cost much system resources. So the CPU effective usage rate may be low in high concurrency situation. In linux system, Although threads a more light weight, they are the basic unit of CPU scheduling. Schedule cost is similar to process in principle.


More process/thread caused other problems

  1. high memory usage. 4GB virtual memory will be used per process in 32bit system. And each thread will cost more 4MB at least.
  2. high CPU usage when context switch.

Engineers found that most consumption happened in kernel space. We know that a process have "user space" and "kernel space", process will enter kernel space when system call invoke or time slice limit tigger... But from OS's view, do not care which state it is, process controlled by OS using a data structure called PCB Process Control Block. We can call code run in kernel space Thread, and call code run in user space Coroutine. OS can't see coroutine's working state, it only care about thread or the PCB structure.

thread coroutine

thread coroutine

So to reduce the consumption in kernel space, shall we bind multiple coroutines to one thread? Of course yes. If we add a schedule layer between thread and coroutine, to bind N coroutines to one thread, we get a N:1 pattern.


In this case, we can do most of jobs in user space rather switch to kernel frequently. But once thread blocked, all coroutine can't work. And multicore CPUs can't run full rate in only one physical thread.


Continue to optimize scheduler, We can bind N coroutines to M physical threads. More complicated scheduler can combine the performance of multithreads and lightweight of coroutines.