Scratchwork: Database Formats

There are a few new database formats I am interested in for purposes of fast/efficient data reading. Information for these are below.

Column-Storage Formats

Databases typically store records as rows. Column-oriented databases store records as columns. This an speed up some types of queries, and works particularly well for large datasets. Wiki page has more info.

Here are particular examples I'd like to explore:

  • Apache Parquet
    • "Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language."
  • MonetDB