You’ve likely heard about the benefits of partitioning data by a single dimension to boost retrieval performance. It’s a common practice in relational databases, NoSQL databases, and, notably, data lakes. For example, a very common dimension to partition data in data lakes is by date or time. However, what if your data querying requirements involve multiple dimensions? Let’s say you wish to query your data by field A and also by field B, or sometimes by field A but other times by field B.

In this post, I’ll go over several common options for such a case.

Leave a Reply

Your email address will not be published. Required fields are marked *