Recently minted database technologies that I find intriguing | Hacker News

I’m a huge fan of databases, so much so that I’ve written a book on so-called “NoSQL” databases, I spent some of my most fruitful years in tech working on the highly influential distributed database Riak, and I even built a database called Purple last year just for fun.

Naturally, I’m always on the lookout for new and exciting developments in databases and DB-related tools when I scan (trash fires like) Twitter, Reddit, and HackerNews. In this post I’d like to talk about three recently minted database technologies that I find intriguing:

In part 2 I’ll cover three others:

In part 3 I’ll conclude with some closing thoughts. Note: I’ll be focusing exclusively on the core technologies and mostly ignoring things like enterprise features (where applicable).

My selection criteria are purely subjective. If there’s something you don’t see here that you think I should be checking out, tweet at me and let me know! My handle is @lucperkins.

TileDB

TileDB is a DB built around multi-dimensional arrays that enables you to easily work with types that aren’t a great fit for existing RDBMS systems, such as dense and sparse arrays and dataframes. TileDB is specifically geared toward use cases like genomics and geospatial data.

Noteworthy features

What I like most

I’m a fan of “specialty” DBs like this that hone in on a specific set of data types and problems. The great thing about traditional RDBMSes is that they’re versatile enough to cover an extremely wide array of use cases (no pun intended) but sometimes you have “last mile” edge cases that are both (a) beyond the capabilities of “kitchen sink” systems and also (b) at the core of your business.

I expect to see the emergence of more systems like this as database use cases become ever more specialized and new problem domains emerge. The old guard RDBMSes aren’t going anywhere, of course, but it’s nonetheless encouraging to see TileDB and others pushing the envelope. What I’m really hoping for is the emergence of extremely “hackable,” resolutely non-monolithic DBs that provide a plugin interface for highly use-case-specific data types, but this is something I’ll talk about later.

Questions for the project

  1. How much work is done on the client library side versus the database side? Are the TileDB clients essentially language-specific math libraries for manipulating complex data types locally and occasionally saving the results to the desired backend or are they like other DB client libraries that mostly just relay commands to the database? It’s not entirely clear from the documentation.
  2. What’s the rationale for providing a key-value store given the plethora of existing K/V options? The docs even say that “TileDB is not designed to work as a special-purpose key-value store.” What’s the value of ratcheting on a feature like this?

Materialize

Materialize touts itself on its website as “the first true SQL streaming database” and that actually may not be overblown! It’s essentially a relational database that’s wire compatible with PostgreSQL but with the crucial difference that it offers materialized views that are updated in real time.

I’ve also seen Materialize described as a streaming data warehouse, which seems fitting.

In standard Postgres, for example, you have to manually update materialized views: