#### **Job Description** Role Purpose: The Data Modernization Lead is the most technically consequential hire in Bupa Arabia's Data Office. The role owns the end-to-end design, build, and operationalization of Bupa Arabia's cloud-native data platform on the cloud starting with Google Cloud Platform (GCP) from real-time Oracle CDC streams to Looker dashboards trusted by actuaries, clinicians, and executives serving over 12 million members. Cloud starting GCP and Big Query are selected. This role builds the platform, not the strategy — writing code, setting engineering standards, enforcing data quality at every Medallion layer, and holding the system together as it scales. The Lead acts as technical authority over any external implementation vendor, holding them accountable to SLA benchmarks and engineering quality. The role is responsible for delivering a Vertical Slice (Business-First Agile) implementation: first measurable business value within 90 days, full enterprise scale within 12–18 months, and a platform compliant with NDMO and PDPL from Day 1. Key Accountabilities: 1-Build & Operate Real-Time Data Ingestion Pipelines: * Design and operate GCP Datastream CDC pipelines from all Oracle sources (CAESAR core insurance system, Oracle EBS) and SQL Server sources (CRM, IPAC, ACCPAC, Edge, Wathiq, and 10+ others) * Build event-driven ingestion using Pub/Sub + Dataflow for MongoDB (Telemedicine, Salma Chatbot), Elasticsearch (Non-NPHIES), JSON (Speech Analytics), and file-based sources * Engineer schema evolution pipelines automatically handle new columns, type changes, and table additions in source systems without failures or manual code changes * Enforce metadata capture: source system, timestamp, job ID, record count, schema version, and lineage marker logged on every ingestion event 2-Design & Deliver the Medallion Architecture (Bronze / Silver / Gold): * Author and maintain all silver and gold layer dbt models in dbt Cloud with Git version control, CI/CD deployment pipelines (GitHub Actions), and automated dbt test suites * Write dbt tests covering completeness, uniqueness, referential integrity, and custom business logic for every Tier-1 KPI: Gross Written Premium, Loss Ratio, Netpaid Claims, Burning Cost, and Lapse Rate * Implement SCD Type 2 for all conformed dimensions: Customer, Member, Product, Contract, Provider, and Channel * Design Analytical MDM layer — golden records for Customer and Member with de-duplication, survivorship rules, and multi-year history preservation for Actuarial models (IBNR, run-off triangles require 3+ years) 3-Build & Govern the Looker BI Semantic Layer: * Build and govern the Looker LookML semantic layer 50+ Tier-1 KPIs with reusable, governed dimensions and measures * Enable self-service exploration: business users must be able to drill from aggregate KPI to individual claim or member record without writing SQL or requesting analyst support * Configure Looker role-based access controls aligned precisely to Big Query column-level policies no user can access data they are not entitled to at any consumption layer Implement embedded analytics for internal portals and clinical dashboards via the Looker REST API; maintain T-1 daily refresh and sub-15-minute micro-batch refresh for operational dashboards 4-Enforce Data Governance, Quality & NDMO / PDPL Compliance: Configure and operate GCP Data plex for automated data discovery, column-level PII / PHI / SPI classification, data lineage (Bronze Silver Gold * Looker), business glossary, and DQ monitoring across all Medallion layers * Enforce NDMO compliance: all data resident in GCP me-central2 (Dammam, KSA); data classification taxonomy applied and auditable; PDPL retention policies enforced at column level * Build and maintain automated source-to-target reconciliation: daily validation that bronze, silver, and gold data reconciles to source with zero tolerance on Tier-1 financial KPIs before any report is released * Define and enforce the Definition of Done for all data engineering deliverables no dataset is complete until dbt tests pass, documentation is merged, lineage is captured, and DQ gates are green 5-Build AI / ML Infrastructure & Operationalize Vertex AI Use Cases: * Design and populate the Vertex AI Feature Store from Gold layer data — enabling Wave 1 AI use cases: FWA Service Overutilization, FWA Duplicated Claims, FWA Provider Collusion, Document OCR Extraction, and member churn propensity * Build Vertex AI Pipelines for automated model training, evaluation, promotion to Model Registry, and deployment to production inference endpoints no manual notebook-to-production process * Collaborate with data scientists and the Track 2 AI team to operationalize models from prototype stage into scalable, monitored GCP inference pipelines * Enable Big Query ML as a self-service modellin

Data Modernization Lead

تفاصيل الوظيفة