back_office_ops · public · workflow

Pair Search: hybrid semantic and keyword search over Singapore's 30,000+ Hansard parliamentary records

Singapore's official Hansard search was purely keyword-based, flooding results with documents that frequently mentioned a query word but were only tangentially related, while presenting no smart text snippets to help users evaluate relevance.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Hansard database ingestion
Over 30,000 Hansard reports from 1955 onwards were scraped, parsed, and standardised into a uniform format for indexing.
Tools used
Vespa.aie5 embeddingsColbertV2BM25
Outcome

Pair Search achieved dramatic improvements in search result quality and is averaging ~150 daily users and ~200 daily searches in its soft launch, with government policy officers reporting productivity gains and faster results.

What failed first

The existing official Hansard search engine ranked results by single-word frequency rather than full-phrase or semantic relevance, causing searches on common terms such as 'covid' to return irrelevant documents with no snippet preview.

Results
Volume~150
Running sinceFebruary 2024
Source

https://hack.gov.sg/hack-for-public-good-2024/2024-projects/pairsearch/

How we source this →

Grounding & classification
Source type: technical build writeup
29 fields verified against source quotes.
data extractionenterprise searchknowledge searchragknowledge basepolicy documentbuilder submittedfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedgovernmentaccuracy improvementemployee productivityresponse time reductiontechnical build writeupback office opsrag answering