Doctolib rebuilds its data platform into a Unified Healthcare Data Platform for AI and analytics
Doctolib's centralized, monolithic data platform — built on a single GitHub repository, Airflow instance, and Redshift cluster with shared admin permissions for all users — blocks its ambition to become the leader in AI for healthcare, with CI test runs taking 30–40 minutes, inability to enforce fine-grained access control for sensitive healthcare data, and limited support for event-driven workflows.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Self-service data ingestion
The Self-Service Ingestion Engine empowers teams to independently ingest data using pre-built connectors, with validation and transformation for analytics readiness.
Doctolib is rebuilding its data platform around a Lakehouse, Data Mesh architecture, ML Training Platform, LLMOps tooling, Vector Database, and a compliance-enforcing DataShield Transformer to securely support AI development alongside reporting. As a prior delivery benchmark, Doctolib completed Tableau Server infrastructure deployment and migration in under three quarters.