There is a perspective on MapReduce that it is a step backwards in several respects:
- In terms of database access, MapReduce is poor implementation
- Can only use brute force (e.g. no indexes)
- MapReduce is not novel
- MapReduce is missing features
- e.g. bulk loader, indexing, updates, transactions
- MapReduce is incompatible with DBMS tools
In response, software like Vertica was made
- Analytical database, designed for OLAP workloads/cluster/big data
Regardless of what system is used, some actions are very slow, like parsing text.
- Loading lines, splitting whitespace, integer parsing are slow operations
To deal with such issues:
- Ned a binary format with schema
- Schema should separate logical and physical views

We arrive at using row or column stores:

- Files are 1 dimensional ⇒ need to project high dimensional data into 1D byte sequence
- For row stores
- Easier to modify a record (in-place update)
- Unnecessary data may be read while processing