AWS Glue is a powerful serverless data integration that simplifies data discovery, preparation, and transformation. However, as with any tool, real-world application reveals quirks and corner cases that are not clearly identified in documentation.
In this article, let’s talk about some key challenges observed from my hands-on experience while building data pipelines using Glue crawlers when dealing with CSV files, schema evolution, partitioning, and crawler update settings.