compliance_monitoring · saas · workflow

Yelp deploys fine-tuned LLMs to proactively detect inappropriate reviews in real time

Yelp needed to detect hate speech, lewdness, threats, and other inappropriate content in user reviews at scale and in real time, requiring high precision to avoid delaying legitimate reviews while preventing harmful content from being published.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Moderator dataset curation
Yelp's User Operations team curated a high-quality dataset of egregious inappropriate reviews and guideline-compliant reviews, using a scoring scheme to signal severity levels.
Tools used
Large Language ModelsHuggingFace model hubRedshiftS3MLFlowMLeapt-SNE
Outcome

Since deploying the LLM pipeline, Yelp's moderators proactively prevented 23,600+ reviews from ever publishing to the platform in 2023, with ongoing moderator feedback expected to further improve the model's recall.

What failed first

Prior automated approaches to inappropriate content detection had unsatisfactory tradeoffs between precision and recall, prompting iteration toward LLMs.

Results
Volume23,600+
Source

https://engineeringblog.yelp.com/2024/03/ai-pipeline-inappropriate-language-detection.html

How we source this →

Grounding & classification
Source type: technical build writeup
24 fields verified against source quotes.
anomaly detectiondocument classificationsocial media posthuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareautomation ratethroughput increasetechnical build writeupcompliance monitoringquality assuranceextract classify routehuman review queue