This tutorial is intended for C beginners who want to do some coding practice, and in the process gain valuable insights regarding low level programming and how (some) Virtual Machines operate under the hood.
By the end of the article, we will have a working register-based VM capable of interpreting and running a limited set of ASM instructions + some bonus programs to test if everything works well.
The code is written in C11, and it will probably compile on most operating systems. The repo can be found here, and the exact source code is vm.c
:
git clone git@github.com:nomemory/lc3-vm.git
If you are a seasoned C developer that have already dabbled in this sort of stuff, you can skip this reading, because it will cover information you probably already know.
The reader should already be familiar with bitwise operations, hexadecimal notation, pointers, pointer functions, C macros and some functions from the standard library (eg., fwrite
and fread
).
It will be unfair not to mention some existing blog posts covering the same topic as this article, the best in this category is Write your Own Virtual Machine by Justin Meyers and Ryan Pendleton. Their code covers a more in-depth implementation of a VM. Compared to this article, our VM is a little bit simpler, and the code takes a different route in terms of the implementation.
In the world of computing, a VM (Virtual Machine) is a term that refers to a system that emulates/virtualize a computer system/architecture.
Broadly speaking, there are two categories of Virtual Machines:
In this article, we will develop a simple Process Virtual Machine designed to execute simple computer programs in a platform-independent environment. Our toy Virtual Machine is based on the LC-3 Computer Architecture, and will be capable of interpreting and executing (a subset of) LC3 Assembly Code.
Little Computer 3, or LC-3, is a type of computer educational programming language, an assembly language, a type of low-level programming language. It features a relatively simple instruction set, but can be used to write moderately complex assembly programs and is a viable target for a C compiler. The language is less complicated than x86 assembly but has many features similar to those in more complex languages. These features make it worthwhile for beginning instruction, so it is most often used to teach fundamentals of programming and computer architecture to computer science and computer engineering students. (wikipedia)
For simplicity, we deliberately stripped down our LC-3 implementation from the following features: interrupt processing, priority levels, process, status registers (PSR), privilege modes, supervisor stack, user stack. We will virtualize only the most basic hardware possible, and we will interact with the outside world (stdin
, stdout
) through traps
.
Our LC-3 inspired VM, like most of the general purpose computers nowadays, is based on the von Neumann computer model, and it will have 3 main components: the CPU, the Main Memory, the input/output devices.
The CPU, an abbreviation for Central Processing Unit is the “circuitry” that controls and manipulates data. Furthermore, the CPU is divided into three layers: ALU, CU and Registers.
ALU stands for Arithmetic/Logic Unit and represents the circuits that are actually carrying the instructions on the data (operations like ADD, XOR, Division, etc.).
CU, an abbreviation for Control Unit, coordinates the activities on CPU.
The registers are quickly accessible “slots” located at the CPU level. The ALU operates on registers. They come in small numbers (that’s a relative statement, as it depends on the architecture), so the amount of data that can be loaded inside the CPU is limited. We use registers to interact with the Main Memory. A typical scenario involves loading a memory location into a register, performing some changes, and putting the data back into memory.