back_office_ops · workflow

Pinterest Feature Trimmer reduces root-leaf ML serving network bandwidth and saves over $4M annually

Pinterest's root-leaf ML serving architecture passed the full union of ML features from root to every leaf partition regardless of which features each model actually needed, creating a network bandwidth bottleneck that forced infrastructure scaling based on network utilization rather than compute and left GPU resources underutilized.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Score request from client service

Client service sends a score request to the online ML serving root to have candidate Pins scored by ML models for relevancy.

Tools used

fbthriftlz4TorchScriptPyTorchGFlags

Outcome

Feature Trimmer saved over $4M in annual infrastructure costs at Pinterest, enabled a 27% Ads root cluster downsizing, reduced the Homefeed root cluster fleet by 33%, achieved roughly 45% and 65% egress drops for Search and Notification clusters, and improved Related Pins p99 latency by about 25–30%.

What failed first

Enabling lz4 compression in fbthrift reduced root-leaf network usage by 20% but at the cost of 5% more CPU and a 5ms (~10%) p90 latency increase, and did not address the underlying problem of transmitting unused features.

Results

Time saved5ms (~10%) p90 latency increase

Volume20%

Cost replaced0.17%

Source

https://medium.com/pinterest-engineering/optimizing-ml-workload-network-efficiency-part-i-feature-trimmer-ae20beb08d69

How we source this →

Grounding & classification

Source type: technical build writeup

32 fields verified against source quotes.

recommendation systemfailure mode describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedsoftwarecost reductioncycle time reductionthroughput increasetechnical build writeupback office ops