This post is about building a unified OLAP platform. An insurance company tries to build a data warehouse that can undertake all its customer-facing, analyst-facing, and management-facing data analysis workloads. The main tasks include:
Self-service insurance contract query: This is for insurance customers to check their contract details by their contract ID. It should also support filters such as coverage period, insurance types, and claim amount.
Multi-dimensional analysis: Analysts develop their reports based on different data dimensions as they need so they can extract insights to facilitate product innovation and their anti-fraud efforts.
Dashboarding: This is to create a visual overview of the insurance sales trends and the horizontal and vertical comparison of different metrics.
Component-Heavy Data Architecture
The user started with Lambda architecture, splitting their data pipeline into a batch processing link and a stream processing link. For real-time data streaming, they apply Flink CDC; for batch import, they incorporate Sqoop, Python, and DataX to build their own data integration tool named Hisen.