For more than a decade, data engineering best practices have revolved around a single assumption: data volume is the primary scalability challenge.
We optimized Parquet sizes, tuned partitioning strategies, compressed aggressively, and scaled compute to handle terabytes and petabytes of data. As long as queries scanned fewer files and clusters had enough memory, performance generally improved.