What Is Graph Data?

፨ g r a p h d a t a ፨

"Graph data" comes with a bundle of general connotations that are specific enough for casual use, but the philosophers will get really angry if you don't have a rigorous "model" that grounds the data's technical syntax. And that's just the philosophers warming up before asking you to define their semantics to ground what the data means.

It's worth noting that you can't help but have a syntax - if you don't define it with words, it'll just be implicit in the implementations of whatever tools you make. So you may as well stay ahead of the game and define your data model externally, so that you can better keep your tools in agreement with each other. This is different than semantics, which really do not exist at all until we make them up (and maybe not even then, too).

The two emerging quasi-competing graph data models are the property graph model (PG) and the Resource Description Framework (RDF). Even these aren't really two instances of the same class, since RDF is a "family of specs" by the W3C, so it'd be more correct to talk about "property graphs vs the RDF data model".

RDF

RDF has a rich and storied history, a suite of exacting specifications, a strong model-theoretic grounding, and a truly staggering corpus of discussion and analysis across internet fora and mailing lists.

But what is it? From the RDF 1.1 syntax spec:

The core structure of the abstract syntax is a set of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph.

Neat! They're like restricted, simplified sentences (and some of its many serializations highlight this in near-natural language form).

The subject is either an IRI (an internationalized URI) or a "blank node" (an unlabeled node that doesn't have an IRI but is still significant or useful for structuring the graph).
The predicate is an IRI.
The object is either an IRI, a blank node, or a "literal" (a number/boolean/string/date/etc).

RDF somewhat resembles "labeled graphs" as you may have encountered them in school: nodes with labels connected by directed edges with labels, just with some rules about what kind of labels are allowed where. RDF allows multiple edges between the same nodes, and even multiple edges with the same label (!!) between the same nodes.

<http://chucknorris.com/data_/chuck> <http://xmlns.com/foaf/0.1/name> "Carlos Ray Norris" .
<http://chucknorris.com/data_/chuck> <http://xmlns.com/foaf/0.1/knows> <http://stevenseagal.com/data_/steven> .
<http://chucknorris.com/data_/chuck> <http://xmlns.com/foaf/0.1/knows> <http://brucelee.com/data_/bruce> .
<http://chucknorris.com/data_/chuck> <http://xmlns.com/foaf/0.1/based_near> _:b0 .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.opengis.net/ont/geosparql#Point> .
_:b0 <http://www.opengis.net/ont/geosparql#lat> 44.45385 .
_:b0 <http://www.opengis.net/ont/geosparql#long> 10.19273 .

Property Graphs

The property graph model is node-oriented. They're nice because they emphasize an object-oriented-ness: your nodes are things that have properties, just like we're used to, except there are a lot of properties have nodes as values too.

RDF is edge-oriented. Commonly serialized as giant text files with lines upon lines of triples. Edge-orientation is nice because it emphasizes the schema-less-ness of your data.