Generalizing 'jq' and Traversal Systems using optics and standard monads

Hi folks! Today I'll be chatting about Traversal Systems like jq and XPath; we're going to discover which properties make them useful, then see how we can replicate their most useful behaviours in Haskell using (almost entirely) pre-ols!existing standard Haskell tools! Let's go!

What's a Traversal System?

First off I'll admit that "Traversal System" is a name I just came up with, you probably won't find anything if you search for it (unless this post really catches on 😉).

A Traversal System allows you dive deeply into a piece of data and may allow you to fetch, query, and edit the structure as you go while maintaining references to other pieces of the structure to influence your work. The goal of most Traversal Systems is to make this as painless and concise as possible. It turns out that this sort of thing is incredibly useful for manipulating JSON, querying HTML and CSS, working with CSVs, or even just handling standard Haskell Records and data-types.

Some good examples of existing Traversal Systems which you may have heard of include the brilliant jq utility for manipulating and querying JSON, the XPath language for querying XML, and the meander data manipulation system in Clojure. Although each of these systems may appear drastically different at a glance, they both accomplish many of the same goals of manipulating and querying data in a concise way.

The similarities between these systems intrigued me! They seem so similar, but yet still seem to share very little in the way of structure, syntax, and prior art. They re-invent the wheel for each new data type! Ideally we could recognize the useful behaviours in each system and build a generalized system which works for any data type.

This post is an attempt to do exactly that; we'll take a look at a few things that these systems do well, then we'll re-build them in Haskell using standard tooling, all the while abstracting over the type of data!

Optics as a basis for a traversal system

For any of those who know me it should be no surprise that my first thought was to look at optics (i.e. Lenses and Traversals). In general I find that optics solve a lot of my problems, but in this case they are particularly appropriate! Optics inherently deal with the idea of diving deep into data and querying or updating data in a structured and compositional fashion.

In addition, optics also allow abstracting over the data type they work on. There are pre-existing libraries of optics for working with JSON via lens-aeson and for html via taggy-lens. I've written optics libraries for working with CSVs and even Regular Expressions, so I can say confidently that they're a brilliantly adaptable tool for data manipulation.

It also happens that optics are well-principled and mathematically sound, so they're a good tool for studying the properties that a system like this may have.

However, optics themselves don't provide everything we need! Optics are rather obtuse, in fact I wrote a whole book to help teach them, and they lack clarity and easy of use when it comes to building larger expressions. It's also pretty tough to work on one part of a data structure while referencing data in another part of the same structure. My hope is to address some of these short comings in this post.

In this particular post I'm mostly interested in explaining a framework for traversal systems in Haskell, we'll be using many standard mtl Monad Transformers alongside a lot of combinators from the lens library. You won't need to understand any of these intimately to get the gist of what's going on, but I won't be explaining them in depth here, so you may need to look elsewhere if you're lacking a bit of context.

Establishing the Problem

I'll be demoing a few examples as we go along so let's set up some data. I'll be working in both jq and Haskell to make comparisons between them, so we'll set up the same data in both JSON and Haskell.

Here's a funny lil' company as a JSON object:

{
    "staff":
      [
        { "id": "1"
        , "name": "bob"
        , "pets": [
              { "name": "Rocky"
              , "type": "cat"
              },
              { "name": "Bullwinkle"
              , "type": "dog"
              }
            ]
        },
        { "id": "2"
        , "name": "sally"
        , "pets": [
              { "name": "Inigo"
              , "type": "cat"
              }
            ]
        }
      ],
    "salaries": {
        "1": 12,
        "2": 15
    }
}

And here's the same data in its Haskell representation, complete with generated optics for each record field.

data Company = Company { _staff :: [Employee]
                       , _salaries :: M.Map Int Int
                       } deriving Show
data Pet = Pet { _petName :: String
               , _petType :: String
               } deriving Show
data Employee = Employee { _employeeId :: Int
                         , _employeeName :: String
                         , _employeePets :: [Pet]
                         } deriving Show

makeLenses ''Company
makeLenses ''Pet
makeLenses ''Employee

company :: Company
company = Company [ Employee 1 "bob" [Pet "Rocky" "cat", Pet "Bullwinkle" "dog"] 
                  , Employee 2 "sally" [Pet "Inigo" "cat"]
                  ] (M.fromList [ (1, 12)
                                , (2, 15)
                                ])

What's a Traversal System?

Optics as a basis for a traversal system

Establishing the Problem

Querying