Summary
Column-store DBMSs claim dramatically improved performance over row-store systems. Comparisons generally fail to compare against “column-optimized” row-store configuration. Authors ask whether row-store DBMS can achieve improved performance on OLAP queries, investigate: vertically partitioned row-store, index-only plans, materialized views. Authors investigate relative contribution to c-store performance of: late materialization, block iteration, column compression and propose invisible joins as additional c-store optimization.
Explanation of star schema benchmark (SSMB). Explanation of motivation for row-store physical layout changes (vertical partitioning, index-only plans, materialized views). Review of c-store optimizations: compression, late materialization, block iteration. Explanation of proposed invisible join optimization (hash, join, rejoin) and between-predicate rewriting.
Experiment goals, motivation and setup defined. Column-store simulation in row-store results: bad across the board. Reasons: per-tuple overheads, early column joins, system insistence on merge over hash joins. Also vertical partitioning prevents use of table partitioning. Column-store performance (on average): compression improves by 2x, late materialization by 3x, block-processing 5-50%, invisible join 50-75%. Denormalization in c-store not a huge benefit due to invisible joins.
Comments
This paper gives a great breakdown of how and why column-store databases massively outperform (by an order of magnitude) row-store databases for analytics queries. They also offhandedly mention some pretty substantial space savings due to better compressibility of columns and less per row overhead: 4 GB compressed for row-store and 2.3 GB compressed for the same data in a column-store. That’s just run-length encoded data which the c-store database actually processes in that form without decompressing.
This all inclines me to believe that a lot of medium-sized organizations that are going to substantial trouble to set up Hadoop plus Pig/Sawzall analysis engines (or, ahem, their own custom system) are probably barking up the wrong tree. You can get a terabyte or more of data on a single machine, and you can stick different tables on different machines if you really have a working set larger than that. The c-store database is going to process that data way faster than any sort of divide and conquer approach until you get to ludicrous quantities of data. The c-store is “mmap()
a giant array of ints and address it like an array” fast. Compare that to the dozens of layers of indirection your data passes through from the disk drive to a Pig query, for example.