An oral history of Bank Python

The strange world of Python, as used by big investment banks

High finance is a foreign country; they do things differently there

Today will I take you through the keyhole to look at a group of software systems not well known to the public, which I call "Bank Python". Bank Python implementations are effectively proprietary forks of the entire Python ecosystem which are in use at many (but not all) of the biggest investment banks. Bank Python differs considerably from the common, or garden-variety Python that most people know and love (or hate).

Thousands of people work on - or rather, inside - these systems but there is not a lot about them on the public web. When I've tried to explain Bank Python in conversations people have often dismissed what I've said as the ravings of a swivel-eyed loon. It all just sounds too bonkers.

I will discuss a fictional, amalgamated, imaginary Bank Python system called "Minerva". The names of subsystems will be changed and though I'll try to be accurate I will have to stylise some details and - of course: I don't know every single detail. I might even make the odd mistake. Hopefully I get the broad strokes.

Barbara, the great key value store

The first thing to know about Minerva is that it is built on a global database of Python objects.

import barbara

# open a connection to the default database "ring"
db = barbara.open()

# pull out some bond
my_gilt = db["/Instruments/UKGILT201510yZXhhbXBsZQ=="]

# calculate the current value of the bond (according to
# the bank's modellers)
current_value: float = my_gilt.value()

Barbara is a simple key value store with a hierarchical key space. It's brutally simple: made just from pickle and zip.

Barbara has multiple "rings", or namespaces, but the default ring is more or less a single, global, object database for the entire bank. From the default ring you can pull out trade data, instrument data (as above), market data and so on. A huge fraction, the majority, of data used day-to-day comes out of Barbara.

Applications also commonly store their internal state in Barbara - writing dataclasses straight in and out with only very simple locking and transactions (if any). There is no filesystem available to Minerva scripts and the little bits of data that scripts pick up has to be put into Barbara.

Internally, Barbara nodes replicate writes within their rings, a bit like how Dynamo and BigTable work. When you call barbara.open() it connects to the nearest working instance of the default ring. Within that single instance reads and writes are strongly consistent. Reads and writes from other instances turn up quickly, but not straight away. If consistency matters you simply ensure that you are always connecting to a specific instance - a practice which is discouraged if not necessary. Barbara is surprisingly robust, probably because it is so simple. Outright failures are exceptionally rare and degraded states only a little more common.

Some example paths from the default ring:

Untitled

Barbara also has some "overlay" features:

# connect to multiple rings: keys are 'overlaid' in order of
# the provided ring names
db = barbara.open("middleoffice;ficc;default")

# get /Etc/Something from the 'middleoffice' ring if it exists there,
# otherwise try 'ficc' and finally the default ring
some_obj = db["/Etc/Something"]

You can list rings in a stack and then each read will try the first ring, and then, if the key is absent there, it will try the second ring, then the third and so on. Writes can either always go to the first ring or to the uppermost ring where that key already exists (determined by configuration that I have not shown).

There are some good reasons not to use Barbara. If your dataset is large it may be a good idea to look elsewhere - perhaps a traditional SQL database or kdb+. The soft limit on (compressed) Barbara object sizes is about 16MB. Zipped pickles are pretty small already so this is actually quite a large size. Barbara does feature secondary indices on object attributes but if secondary indices are a very important part of your program, it is also a good idea to look elsewhere.

Barbara, the great key value store

Dagger, a directed, acyclic graph of financial instruments