This post is the second part of our Parquet and ClickHouse blog series. In this post, we explore the Parquet format in more detail, highlighting key details to consider when reading and writing files with ClickHouse. For more experienced Parquet users, we also discuss optimizations that users can make when writing Parquet files to maximize compression, as well as some recent developments to optimize read performance using parallelization.

For our examples, we continue to utilize the UK house price dataset. This contains data about prices paid for real estate property in England and Wales from 1995 to the time of writing. We distribute this in Parquet format in the public s3 bucket s3://datasets-documentation/uk-house-prices/parquet/. We read and write local and S3-hosted Parquet files using ClickHouse Local. ClickHouse Local is an easy-to-use version of ClickHouse that is ideal for developers who need to perform fast processing on local and remote files using SQL without having to install a full database server. Most importantly, ClickHouse Local and ClickHouse Server share the same code for Parquet reading and writing, so any details apply to both. For further details, see our previous post in this series and other recent dedicated content. 

Leave a Reply

Your email address will not be published. Required fields are marked *