Behind a query, something like the following occurs:

image.png

  1. Query is parsed into an abstract syntax tree (AST)
  2. From AST, decide the most efficient way to run the query + create immediate representation (IR)
  3. Generate code (e.g. Java) that is equivalent to IR
  4. Execution - compile the code and submit to cluster (e.g. Hadoop cluster)

Note that an RDBMS would just steps 1, 2, then execute directly.

Relational Algebra

Recall relational algebra operations:

Relational Joins Performing efficient joins are one of the harder problems with query and schema design.

Types of relationships in joins include one-to-one, one-to-many, and many-to-many - different strategies for each.