Intro

Data modeling questions are a fairly normal part of the interview process, and it is worth knowing some basic guidelines for how to organize data. Although in this piece I will primarily talk about this at the database level, it is good knowledge to have across the stack - knowing when a piece of logic should be broken out into a reusable component is something frontend engineers will frequently run into, and it gets into similar logical territory because we are thinking about how it makes sense to organize our information, what belongs together, and what should be broken up.

Because it is the predominant type of database, we are mainly going to talk about relational databases and how to model data using good practices for relational databases specifically. But it is worth knowing that there are many other types of database out there, and what count as “good practices” can be different between different database systems. In addition to relational databases, there are:

Among others. Before you feel pressure to learn all that, it is worth saying that I’ve been doing this a long time and have really only ever significantly dealt with relational databases, key value stores, and NoSQL. There is also some crossover in functionality between databases, at times. One of my favorite things about MongoDB is the ability to store data structures similar to JSON, but PostgreSQL, a relational database, also allows you to store JSON if you need to. Some NoSQL databases also support query languages like SQL (structured query language), the language of relational databases.

Basic ideas

I feel like data modeling is one of those terms that can sound intimidating before you know what it is. I had several of these as a junior developer where when I would hear the term, for some reason my mind would go blank. (”Scripting language” was one of these, even though all I knew was scripting languages 🤷‍♀️).

If hearing data modeling makes you think of training AI models, or complex statistical models like they do for political elections, you can put that to the side. You should be happy to know, data modeling for relational databases more or less follows common sense.

Relational databases and SQL have actually come to be a favorite technology of mine, because the principles so closely echo common sense. It is also somewhat unique for a technology as old as relational databases to have survived the rapidly changing technological landscape since their introduction by E.F. Codd between the 1960s-1980s. It is somewhat rare for a technology so old to still work so well and be so well liked by the people using it. The history behind this technology is interesting, and you can read more about it here.

If I could summarize the “common sense” ideas that govern good data modeling for a relational database for someone with zero background, I’d probably highlight the following: