Big data has significantly evolved since its inception in the late 2000s. Many organizations quickly adapted to the trend and built their big data platforms using open-source tools like Apache Hadoop. Later, these companies started facing trouble managing the rapidly evolving data processing needs. They have faced challenges handling schema level changes, partition scheme evolution, and going back in time to look at the data. 

I faced similar challenges while designing large-scale distributed systems back in the 2010s for a big tech company and a healthcare customer. Some industries need these capabilities to adhere to banking, finance, and healthcare regulations. Heavy data-driven companies like Netflix faced similar challenges as well. They invented a table format called “Iceberg,” which sits on top of the existing data files and delivers key features by leveraging its architecture. This has quickly become the top ASF project as it gained rapid interest in the data community. I will explore the top 5 Apache Iceberg key features in this article with examples and diagrams. 

Leave a Reply

Your email address will not be published. Required fields are marked *