back_office_ops · healthcare · workflow

Doctolib rebuilds its data platform into a Unified Healthcare Data Platform for AI and analytics

Doctolib's centralized, monolithic data platform — built on a single GitHub repository, Airflow instance, and Redshift cluster with shared admin permissions for all users — blocks its ambition to become the leader in AI for healthcare, with CI test runs taking 30–40 minutes, inability to enforce fine-grained access control for sensitive healthcare data, and limited support for event-driven workflows.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Self-service data ingestion
The Self-Service Ingestion Engine empowers teams to independently ingest data using pre-built connectors, with validation and transformation for analytics readiness.
Tools used
AirflowEKSRedshiftdbt-coreLambdaDynamoDBGitHubPostgreSQLTableau ServerKubernetesCloudflareHL7FHIROMOPDICOM
Outcome

Doctolib is rebuilding its data platform around a Lakehouse, Data Mesh architecture, ML Training Platform, LLMOps tooling, Vector Database, and a compliance-enforcing DataShield Transformer to securely support AI development alongside reporting. As a prior delivery benchmark, Doctolib completed Tableau Server infrastructure deployment and migration in under three quarters.

Results
Time saved30–40 minutes
Source

https://medium.com/doctolib/building-a-unified-healthcare-data-platform-architecture-2bed2aaaf437

How we source this →

Grounding & classification
Source type: technical build writeup
28 fields verified against source quotes.
conversational airagsummarizationmedical recordfailure mode describedmetric backednamed customertools describedworkflow describedhealthcareemployee productivitytime savedtechnical build writeupback office opscompliance monitoringdata entry opsdata sync enrichmentrag answering