Learn Python ASTs, by building your own linter

Hi! Before we start, a word of caution: This is an extremely long article, and it might take multiple hours to finish completely. But I promise it'll be worth it.

Oh, and follow me on twitter, I frequently post Python content there as well.

Index

So what is an AST?

In programmer terms, "ASTs are a programmatic way to understand the structure of your source code". But to understand what that really means, we must first understand a few things about the structure of a computer program.

The programs that you and I write in our language of choice is usually called the "source code", and I'll be referring to it as such in this article.

On the other end, computer chips can only understand "machine code", which is a set of binary numbers that have special meanings for that model of the chip. Some of these numbers are instructions, which tell the CPU a simple task to perform, like "add the numbers stored in these two places", or "jump 10 numbers down and continue running code from there". The instructions run one by one, and they dictate the flow of the program.

Similarly, you define your programs as a set of "statements", with each statement being one thing that you want your code to do. They're sort of a more human-friendly version of the CPU instructions, that we can write and reason with more easily.

Now, I know that theory can get boring really quick, so I'm going to go through a bunch of examples. Let's write the same piece of code in many languages, and notice the similarities:

Python

```
def area_of_circle(radius):
    pi = 3.14
    return pi * radius * radius

area_of_circle(5)
# Output: 78.5
```

Scheme Lisp

```
(define (area_of_circle radius)
  (define pi 3.14)
  (* pi radius radius))

(area_of_circle 5)
; Output: 78.5
```

Go

```
package main

func area_of_circle(radius float64) float64 {
  pi := 3.14
  return pi * radius * radius
}

func main() {
  println(area_of_circle(5))
}
// Output: +7.850000e+001
```

We're doing essentially the same thing in all of these, and I'll break it down piece by piece:

We're defining our source code as a block of statements. In our case, there are two statements at the top-level of our source code: one statement that defines our `area_of_circle` function, and another statement that runs this function with the value "5".

The definition of the `area_of_circle` function has two parts: the input parameters (the radius, in our case), and the body, which itself is a block of statements. There's two statements inside `area_of_circle` to be specific: the first one defines `pi`, and the second one uses it to calculate the area, and returns it.

For the languages that have a main function, the definition of the main function itself is a statement. Inside that statement we are writing *more statements*, like one that prints out the value of `area_of_circle` called with the radius of 5.