https://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/

Continuing on with my series about microservices implementations (see “Why Microservices Should Be Event Driven”, “Three things to make your microservices more resilient”, “Carving the Java EE Monolith: Prefer Verticals, not Layers” for background) we’re going to explore probably the hardest problem when creating and developing microservices. Your data. Using Spring Boot/Dropwizard/Docker doesn’t mean you’re doing microservices. Taking a hard look at your domain and your data will help you get to microservices.

Follow along for the rest of the series (twitter: @christianposta, RSS/blog: blog.christianposta.com)

Of the reasons we attempt a microservices architecture, chief among them is allowing your teams to be able to work on different parts of the system at different speeds with minimal impact across teams. So we want teams to be autonomous, capable of making decisions about how to best implement and operate their services, and free to make changes as quickly as the business may desire. If we have our teams organized to do this, then the reflection in our systems architecture will begin to evolve into something that looks like microservices.

To gain this autonomy, we need to “shed our dependencies” but that’s a lot easier to say than do. I’ve seen folks refer to this idea in part, trivially, as “each microservice should own and control its own database and no two services should share a database.” The idea is sound: don’t share a single database across services because then you run into conflicts like competing read/write patterns, data-model conflicts, coordination challenges, etc. But a single database does afford us a lot of safeties and conveniences: ACID transactions, single place to look, well understood (kinda?), one place to manage, etc. So when building microservices how do we reconcile these safeties with splitting up our database into multiple smaller databases?

Let’s see. First, for an “enterprise” building microservices, we need to make the following things clear:

What is the domain?

This seems to be ignored at a lot of places but is a huge difference between how the internet companies practice microservices and how a traditional enterprise may (or may fail because of neglecting this) implement microservices.

Before we can build a microservice, and reason about the data it uses (produces/consumes, etc) we need to have a reasonably good, crisp understanding about what that data is representing. For example, before we can store information into a database about “bookings” for our TicketMonster and its migration to microservices, we need to understand “what is a booking”. Just like in your domain, you may need to understand what is an Account, or an Employee, or a Claim, etc.

To do that we need to dig into what is “it” in reality? For example, “what is a book”? Try to stop and think about that, as it’s a fairly simple example. Try to think what is a book. How would we express this in a data model?

Is a book something with pages? Is a newspaper a book (it has pages)? So maybe a book has a hard cover? Or is not something that’s released/published every day? If I write a book (which I did :) Microservices for Java Developers) the publisher may have an entry for me with a single row representing my book. But a bookstore may have 5 of my books. Is each one a book? Or are they copies? How would we represent this? What if a book is so long it has to be broken down into volumes? Is each volume a book? Or all of them combined? What if many small compositions are combined together? Is the combination the book? Or each individual one? So basically I can publish a book, have many copies of it in a bookstore, each one with multiple volumes. So what is a book then?

The reality is there is no reality. There is no objective definition of “what is a book” with respect to reality so to answer any question like that, we have to know “who’s asking the question and what is the context”. Context is king. We as humans can quickly (and even unconsciously) resolve the ambiguity of this understanding because we have a context in our heads, in the environment, and in the question. But a computer doesn’t. We need to make this context explicit when we build our software and model our data. Using a book is to illustrate this is simplistic. Your domain (an enterprise) with its Accounts, Customers, Bookings, Claims, etc is going to be far more complicated and far more conflicting/ambiguous. We need boundaries.

Where do we draw the boundaries? The work in the Domain Driven Design community helps us deal with this complexity in the domain. We draw a bounded context around Entities, Value Objects, and Aggregates that model* our domain. Stated another way, we build and refine a model that represents our domain and that model is contained within a boundary that defines our context. And this is explicit. These boundaries end up being our microservices, or, the components within the boundaries end up being microservices, or both. Either way, microservices is about boundaries and so is DDD.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ff780655-b7d5-4908-9f31-ac7cb1b6088c/boundaries.png

Our data model (how we wish to represent concepts in a physical data store…note the explicit difference here) is driven by our domain model, not the other way around. When we have this boundary, we know, and can make assertions, about what is “correct” in our model and what is incorrect. These boundaries also imply a certain level of autonomy. Bounded context “A” may have a different understanding of what a “book” is than bounded context “B” (eg, maybe bounded context “A” is a search service that searches for titles where a single title is a “book”; maybe bounded context “B” is a checkout service that processes a transaction based on how many books (titles+copies) you’re buying, etc).